Interdisciplinary Sciences: Computational Life Sciences最新文献_第6页

Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation. 基于图神经网络的计算药物重新定位与大语言模型参考知识表示。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-09-26 DOI: 10.1007/s12539-024-00654-7

Yaowen Gu, Zidu Xu, Carl Yang

Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDA_{Node Feat}, LLM-DDA_{Dual GNN}, LLM-DDA_GNN-AE) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDA_GNN-AE achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.

通过预测药物-疾病关联（DDA）来计算药物重新定位，为发现新的药物适应症提供了巨大潜力。目前的方法将图神经网络（GNN）纳入药物-疾病异构网络来预测 DDA，与传统的机器学习和矩阵因式分解方法相比，取得了显著的效果。然而，这些方法在很大程度上依赖于网络拓扑结构，受制于不完整和有噪声的网络数据，忽略了大量可用的生物医学知识。与此相对应，大型语言模型（LLM）在图搜索和关系推理方面表现出色，有可能加强将全面的生物医学知识整合到药物和疾病概况中。在本研究中，我们首先研究了 LLM 推断的知识表征在药物重新定位和 DDA 预测中的贡献。我们为 LLM 设计了一个零射提示模板，以提取高质量的药物和疾病实体知识描述，然后通过语言模型的嵌入生成将离散文本转换为连续的数字表示。然后，我们提出了具有三种不同模型架构（LLM-DDANode Feat、LLM-DDADual GNN、LLM-DDAGNN-AE）的 LLM-DDA，以研究基于 LLM 的嵌入的最佳融合模式。在四个 DDA 基准上进行的广泛实验表明，与 11 个基线相比，LLM-DDAGNN-AE 实现了最佳性能，AUPR 整体相对提高了 23.22%，F1-Score 提高了 17.20%，精度提高了 25.35%。同时，涉及泼尼松和过敏性鼻炎的选定案例研究凸显了该模型在现有文献支持下识别可靠的 DDA 和知识描述的能力。这项研究展示了 LLM 在药物重新定位方面的实用性，以及它在其他生物医学关系预测任务中的通用性和适用性。

{"title":"Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation.","authors":"Yaowen Gu, Zidu Xu, Carl Yang","doi":"10.1007/s12539-024-00654-7","DOIUrl":"https://doi.org/10.1007/s12539-024-00654-7","url":null,"abstract":"Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDANode Feat, LLM-DDADual GNN, LLM-DDAGNN-AE) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDAGNN-AE achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142346018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bilinear Perceptual Fusion Algorithm Based on Brain Functional and Structural Data for ASD Diagnosis and Regions of Interest Identification 基于大脑功能和结构数据的双线性感知融合算法，用于 ASD 诊断和感兴趣区识别

IF 4.8 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-09-10 DOI: 10.1007/s12539-024-00651-w

Jinxiong Fang, Da-fang Zhang, Kun Xie, Luyun Xu, Xia-an Bi

Autism spectrum disorder (ASD) is a serious mental disorder with a complex pathogenesis mechanism and variable presentation among individuals. Although many deep learning algorithms have been used to diagnose ASD, most of them focus on a single modality of data, resulting in limited information extraction and poor stability. In this paper, we propose a bilinear perceptual fusion (BPF) algorithm that leverages data from multiple modalities. In our algorithm, different schemes are used to extract features according to the characteristics of functional and structural data. Through bilinear operations, the associations between the functional and structural features of each region of interest (ROI) are captured. Then the associations are used to integrate the feature representation. Graph convolutional neural networks (GCNs) can effectively utilize topology and node features in brain network analysis. Therefore, we design a deep learning framework called BPF-GCN and conduct experiments on publicly available ASD dataset. The results show that the classification accuracy of BPF-GCN reached 82.35%, surpassing existing methods. This demonstrates the superiority of its classification performance, and the framework can extract ROIs related to ASD. Our work provides a valuable reference for the timely diagnosis and treatment of ASD.

Graphical Abstract

Based on the extracted functional and structural features, we design a generic framework called BPF-GCN. It can not only diagnose ASD, but also identify pathogenic ROIs. BPF-GCN consists of four parts. They are extraction of brain functional features, extraction of brain structural features, feature fusion and classification.

自闭症谱系障碍（ASD）是一种严重的精神障碍，发病机制复杂，个体表现各异。虽然许多深度学习算法已被用于诊断 ASD，但它们大多只关注单一模态数据，导致信息提取有限且稳定性差。在本文中，我们提出了一种双线性知觉融合（BPF）算法，该算法可充分利用来自多种模态的数据。在我们的算法中，根据功能数据和结构数据的特点，采用不同的方案来提取特征。通过双线性运算，捕捉每个感兴趣区域（ROI）的功能和结构特征之间的关联。然后利用这些关联来整合特征表示。图卷积神经网络（GCN）可以在脑网络分析中有效利用拓扑和节点特征。因此，我们设计了一个名为 BPF-GCN 的深度学习框架，并在公开的 ASD 数据集上进行了实验。结果表明，BPF-GCN 的分类准确率达到 82.35%，超过了现有方法。这证明了其分类性能的优越性，而且该框架可以提取与 ASD 相关的 ROI。我们的工作为及时诊断和治疗 ASD 提供了有价值的参考。它不仅能诊断 ASD，还能识别致病 ROI。BPF-GCN 包括四个部分。它们分别是脑功能特征提取、脑结构特征提取、特征融合和分类。

{"title":"Bilinear Perceptual Fusion Algorithm Based on Brain Functional and Structural Data for ASD Diagnosis and Regions of Interest Identification","authors":"Jinxiong Fang, Da-fang Zhang, Kun Xie, Luyun Xu, Xia-an Bi","doi":"10.1007/s12539-024-00651-w","DOIUrl":"https://doi.org/10.1007/s12539-024-00651-w","url":null,"abstract":"Autism spectrum disorder (ASD) is a serious mental disorder with a complex pathogenesis mechanism and variable presentation among individuals. Although many deep learning algorithms have been used to diagnose ASD, most of them focus on a single modality of data, resulting in limited information extraction and poor stability. In this paper, we propose a bilinear perceptual fusion (BPF) algorithm that leverages data from multiple modalities. In our algorithm, different schemes are used to extract features according to the characteristics of functional and structural data. Through bilinear operations, the associations between the functional and structural features of each region of interest (ROI) are captured. Then the associations are used to integrate the feature representation. Graph convolutional neural networks (GCNs) can effectively utilize topology and node features in brain network analysis. Therefore, we design a deep learning framework called BPF-GCN and conduct experiments on publicly available ASD dataset. The results show that the classification accuracy of BPF-GCN reached 82.35%, surpassing existing methods. This demonstrates the superiority of its classification performance, and the framework can extract ROIs related to ASD. Our work provides a valuable reference for the timely diagnosis and treatment of ASD.<h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3>Based on the extracted functional and structural features, we design a generic framework called BPF-GCN. It can not only diagnose ASD, but also identify pathogenic ROIs. BPF-GCN consists of four parts. They are extraction of brain functional features, extraction of brain structural features, feature fusion and classification.\u0000","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":"41 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142215931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Protein Multiple Conformation Prediction Using Multi-Objective Evolution Algorithm. 利用多目标进化算法预测蛋白质的多重构象

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-09-01 Epub Date: 2024-01-08 DOI: 10.1007/s12539-023-00597-5

Minghua Hou, Sirong Jin, Xinyue Cui, Chunxiang Peng, Kailong Zhao, Le Song, Guijun Zhang

The breakthrough of AlphaFold2 and the publication of AlphaFold DB represent a significant advance in the field of predicting static protein structures. However, AlphaFold2 models tend to represent a single static structure, and multiple-conformation prediction remains a challenge. In this work, we proposed a method named MultiSFold, which uses a distance-based multi-objective evolutionary algorithm to predict multiple conformations. To begin, multiple energy landscapes are constructed using different competing constraints generated by deep learning. Subsequently, an iterative modal exploration and exploitation strategy is designed to sample conformations, incorporating multi-objective optimization, geometric optimization and structural similarity clustering. Finally, the final population is generated using a loop-specific sampling strategy to adjust the spatial orientations. MultiSFold was evaluated against state-of-the-art methods using a benchmark set containing 80 protein targets, each characterized by two representative conformational states. Based on the proposed metric, MultiSFold achieves a remarkable success ratio of 56.25% in predicting multiple conformations, while AlphaFold2 only achieves 10.00%, which may indicate that conformational sampling combined with knowledge gained through deep learning has the potential to generate conformations spanning the range between different conformational states. In addition, MultiSFold was tested on 244 human proteins with low structural accuracy in AlphaFold DB to test whether it could further improve the accuracy of static structures. The experimental results demonstrate the performance of MultiSFold, with a TM-score better than that of AlphaFold2 by 2.97% and RoseTTAFold by 7.72%. The online server is at http://zhanglab-bioinf.com/MultiSFold .

AlphaFold2 的突破和 AlphaFold DB 的出版代表了静态蛋白质结构预测领域的重大进展。然而，AlphaFold2 模型倾向于表示单一静态结构，多构象预测仍然是一个挑战。在这项工作中，我们提出了一种名为 MultiSFold 的方法，它使用基于距离的多目标进化算法来预测多种构象。首先，利用深度学习生成的不同竞争约束构建多个能量景观。随后，设计出一种迭代模式探索和利用策略，结合多目标优化、几何优化和结构相似性聚类，对构象进行采样。最后，利用特定环路采样策略生成最终群体，以调整空间方向。MultiSFold 与最先进的方法进行了对比评估，使用的基准集包含 80 个蛋白质目标，每个目标都有两种代表性构象状态。根据提出的指标，MultiSFold 在预测多种构象方面取得了 56.25% 的显著成功率，而 AlphaFold2 仅取得了 10.00% 的成功率，这可能表明构象采样与通过深度学习获得的知识相结合，有可能生成跨越不同构象状态之间范围的构象。此外，MultiSFold 还对 AlphaFold DB 中结构准确性较低的 244 种人类蛋白质进行了测试，以检验它是否能进一步提高静态结构的准确性。实验结果证明了 MultiSFold 的性能，其 TM 分数比 AlphaFold2 高 2.97%，比 RoseTTAFold 高 7.72%。在线服务器地址为 http://zhanglab-bioinf.com/MultiSFold。

{"title":"Protein Multiple Conformation Prediction Using Multi-Objective Evolution Algorithm.","authors":"Minghua Hou, Sirong Jin, Xinyue Cui, Chunxiang Peng, Kailong Zhao, Le Song, Guijun Zhang","doi":"10.1007/s12539-023-00597-5","DOIUrl":"10.1007/s12539-023-00597-5","url":null,"abstract":"The breakthrough of AlphaFold2 and the publication of AlphaFold DB represent a significant advance in the field of predicting static protein structures. However, AlphaFold2 models tend to represent a single static structure, and multiple-conformation prediction remains a challenge. In this work, we proposed a method named MultiSFold, which uses a distance-based multi-objective evolutionary algorithm to predict multiple conformations. To begin, multiple energy landscapes are constructed using different competing constraints generated by deep learning. Subsequently, an iterative modal exploration and exploitation strategy is designed to sample conformations, incorporating multi-objective optimization, geometric optimization and structural similarity clustering. Finally, the final population is generated using a loop-specific sampling strategy to adjust the spatial orientations. MultiSFold was evaluated against state-of-the-art methods using a benchmark set containing 80 protein targets, each characterized by two representative conformational states. Based on the proposed metric, MultiSFold achieves a remarkable success ratio of 56.25% in predicting multiple conformations, while AlphaFold2 only achieves 10.00%, which may indicate that conformational sampling combined with knowledge gained through deep learning has the potential to generate conformations spanning the range between different conformational states. In addition, MultiSFold was tested on 244 human proteins with low structural accuracy in AlphaFold DB to test whether it could further improve the accuracy of static structures. The experimental results demonstrate the performance of MultiSFold, with a TM-score better than that of AlphaFold2 by 2.97% and RoseTTAFold by 7.72%. The online server is at http://zhanglab-bioinf.com/MultiSFold .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"519-531"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139377543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting circRNA-RBP Binding Sites Using a Hybrid Deep Neural Network. 利用混合深度神经网络预测 circRNA-RBP 结合位点

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-09-01 Epub Date: 2024-02-21 DOI: 10.1007/s12539-024-00616-z

Liwei Liu, Yixin Wei, Zhebin Tan, Qi Zhang, Jianqiang Sun, Qi Zhao

Circular RNAs (circRNAs) are non-coding RNAs generated by reverse splicing. They are involved in biological process and human diseases by interacting with specific RNA-binding proteins (RBPs). Due to traditional biological experiments being costly, computational methods have been proposed to predict the circRNA-RBP interaction. However, these methods have problems of single feature extraction. Therefore, we propose a novel model called circ-FHN, which utilizes only circRNA sequences to predict circRNA-RBP interactions. The circ-FHN approach involves feature coding and a hybrid deep learning model. Feature coding takes into account the physicochemical properties of circRNA sequences and employs four coding methods to extract sequence features. The hybrid deep structure comprises a convolutional neural network (CNN) and a bidirectional gated recurrent unit (BiGRU). The CNN learns high-level abstract features, while the BiGRU captures long-term dependencies in the sequence. To assess the effectiveness of circ-FHN, we compared it to other computational methods on 16 datasets and conducted ablation experiments. Additionally, we conducted motif analysis. The results demonstrate that circ-FHN exhibits exceptional performance and surpasses other methods. circ-FHN is freely available at https://github.com/zhaoqi106/circ-FHN .

环状 RNA（circRNA）是由反向剪接产生的非编码 RNA。它们通过与特定的 RNA 结合蛋白（RBPs）相互作用，参与生物过程和人类疾病。由于传统的生物学实验成本高昂，人们提出了计算方法来预测 circRNA-RBP 相互作用。然而，这些方法都存在单一特征提取的问题。因此，我们提出了一种名为 circ-FHN 的新模型，它只利用 circRNA 序列来预测 circRNA-RBP 相互作用。circ-FHN 方法包括特征编码和混合深度学习模型。特征编码考虑到 circRNA 序列的物理化学特性，采用四种编码方法提取序列特征。混合深度结构包括一个卷积神经网络（CNN）和一个双向门控递归单元（BiGRU）。CNN 学习高级抽象特征，而 BiGRU 则捕捉序列中的长期依赖关系。为了评估 circ-FHN 的有效性，我们在 16 个数据集上将其与其他计算方法进行了比较，并进行了消融实验。此外，我们还进行了主题分析。结果表明，circ-FHN 性能卓越，超越了其他方法。circ-FHN 可在 https://github.com/zhaoqi106/circ-FHN 免费获取。

{"title":"Predicting circRNA-RBP Binding Sites Using a Hybrid Deep Neural Network.","authors":"Liwei Liu, Yixin Wei, Zhebin Tan, Qi Zhang, Jianqiang Sun, Qi Zhao","doi":"10.1007/s12539-024-00616-z","DOIUrl":"10.1007/s12539-024-00616-z","url":null,"abstract":"Circular RNAs (circRNAs) are non-coding RNAs generated by reverse splicing. They are involved in biological process and human diseases by interacting with specific RNA-binding proteins (RBPs). Due to traditional biological experiments being costly, computational methods have been proposed to predict the circRNA-RBP interaction. However, these methods have problems of single feature extraction. Therefore, we propose a novel model called circ-FHN, which utilizes only circRNA sequences to predict circRNA-RBP interactions. The circ-FHN approach involves feature coding and a hybrid deep learning model. Feature coding takes into account the physicochemical properties of circRNA sequences and employs four coding methods to extract sequence features. The hybrid deep structure comprises a convolutional neural network (CNN) and a bidirectional gated recurrent unit (BiGRU). The CNN learns high-level abstract features, while the BiGRU captures long-term dependencies in the sequence. To assess the effectiveness of circ-FHN, we compared it to other computational methods on 16 datasets and conducted ablation experiments. Additionally, we conducted motif analysis. The results demonstrate that circ-FHN exhibits exceptional performance and surpasses other methods. circ-FHN is freely available at https://github.com/zhaoqi106/circ-FHN .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"635-648"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139912565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DOTAD: A Database of Therapeutic Antibody Developability. DOTAD：治疗性抗体可开发性数据库。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-09-01 Epub Date: 2024-03-26 DOI: 10.1007/s12539-024-00613-2

Wenzhen Li, Hongyan Lin, Ziru Huang, Shiyang Xie, Yuwei Zhou, Rong Gong, Qianhu Jiang, ChangCheng Xiang, Jian Huang

The development of therapeutic antibodies is an important aspect of new drug discovery pipelines. The assessment of an antibody's developability-its suitability for large-scale production and therapeutic use-is a particularly important step in this process. Given that experimental assays to assess antibody developability in large scale are expensive and time-consuming, computational methods have been a more efficient alternative. However, the antibody research community faces significant challenges due to the scarcity of readily accessible data on antibody developability, which is essential for training and validating computational models. To address this gap, DOTAD (Database Of Therapeutic Antibody Developability) has been built as the first database dedicated exclusively to the curation of therapeutic antibody developability information. DOTAD aggregates all available therapeutic antibody sequence data along with various developability metrics from the scientific literature, offering researchers a robust platform for data storage, retrieval, exploration, and downloading. In addition to serving as a comprehensive repository, DOTAD enhances its utility by integrating a web-based interface that features state-of-the-art tools for the assessment of antibody developability. This ensures that users not only have access to critical data but also have the convenience of analyzing and interpreting this information. The DOTAD database represents a valuable resource for the scientific community, facilitating the advancement of therapeutic antibody research. It is freely accessible at http://i.uestc.edu.cn/DOTAD/ , providing an open data platform that supports the continuous growth and evolution of computational methods in the field of antibody development.

治疗性抗体的开发是新药研发管道的一个重要方面。在这一过程中，评估抗体的可开发性--其是否适合大规模生产和治疗用途--是尤为重要的一步。鉴于大规模评估抗体可开发性的实验检测既昂贵又耗时，计算方法成为了更有效的替代方法。然而，抗体研究界面临着巨大的挑战，因为缺乏可随时获取的抗体可开发性数据，而这些数据对于训练和验证计算模型至关重要。为了填补这一空白，我们建立了 DOTAD（治疗性抗体可发展性数据库），这是第一个专门用于整理治疗性抗体可发展性信息的数据库。DOTAD 汇集了所有可用的治疗性抗体序列数据以及科学文献中的各种可开发性指标，为研究人员提供了一个强大的数据存储、检索、探索和下载平台。DOTAD 除了作为一个综合资料库外，还通过集成一个基于网络的界面来增强其实用性，该界面具有最先进的抗体可开发性评估工具。这确保了用户不仅能访问关键数据，还能方便地分析和解读这些信息。DOTAD 数据库是科学界的宝贵资源，促进了治疗性抗体研究的发展。该数据库可在 http://i.uestc.edu.cn/DOTAD/ 免费访问，它提供了一个开放的数据平台，支持抗体开发领域计算方法的不断发展和演变。

{"title":"DOTAD: A Database of Therapeutic Antibody Developability.","authors":"Wenzhen Li, Hongyan Lin, Ziru Huang, Shiyang Xie, Yuwei Zhou, Rong Gong, Qianhu Jiang, ChangCheng Xiang, Jian Huang","doi":"10.1007/s12539-024-00613-2","DOIUrl":"10.1007/s12539-024-00613-2","url":null,"abstract":"The development of therapeutic antibodies is an important aspect of new drug discovery pipelines. The assessment of an antibody's developability-its suitability for large-scale production and therapeutic use-is a particularly important step in this process. Given that experimental assays to assess antibody developability in large scale are expensive and time-consuming, computational methods have been a more efficient alternative. However, the antibody research community faces significant challenges due to the scarcity of readily accessible data on antibody developability, which is essential for training and validating computational models. To address this gap, DOTAD (Database Of Therapeutic Antibody Developability) has been built as the first database dedicated exclusively to the curation of therapeutic antibody developability information. DOTAD aggregates all available therapeutic antibody sequence data along with various developability metrics from the scientific literature, offering researchers a robust platform for data storage, retrieval, exploration, and downloading. In addition to serving as a comprehensive repository, DOTAD enhances its utility by integrating a web-based interface that features state-of-the-art tools for the assessment of antibody developability. This ensures that users not only have access to critical data but also have the convenience of analyzing and interpreting this information. The DOTAD database represents a valuable resource for the scientific community, facilitating the advancement of therapeutic antibody research. It is freely accessible at http://i.uestc.edu.cn/DOTAD/ , providing an open data platform that supports the continuous growth and evolution of computational methods in the field of antibody development.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"623-634"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140293414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BDM: An Assessment Metric for Protein Complex Structure Models Based on Distance Difference Matrix. BDM：基于距离差矩阵的蛋白质复合结构模型评估指标。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-09-01 Epub Date: 2024-03-27 DOI: 10.1007/s12539-024-00622-1

Jiaqi Zhai, Wenda Wang, Ranxi Zhao, Daiwen Sun, Da Lu, Xinqi Gong

Protein complex structure prediction is an important problem in computational biology. While significant progress has been made for protein monomers, accurate evaluation of protein complexes remains challenging. Existing assessment methods in CASP, lack dedicated metrics for evaluating complexes. DockQ, a widely used metric, has some limitations. In this study, we propose a novel metric called BDM (Based on Distance difference Matrix) for assessing protein complex prediction structures. Our approach utilizes a distance difference matrix derived from comparing real and predicted protein structures, establishing a linear correlation with Root Mean Square Deviation (RMSD). BDM overcomes limitations associated with receptor-ligand differentiation and eliminates the requirement for structure alignment, making it a more effective and efficient metric. Evaluation of BDM using CASP14 and CASP15 test sets demonstrates superior performance compared to the official CASP scoring. BDM provides accurate and reasonable assessments of predicted protein complexes, wide adoption of BDM has the potential to advance protein complex structure prediction and facilitate related researches across scientific domains. Code is available at http://mialab.ruc.edu.cn/BDMServer/ .

蛋白质复合体结构预测是计算生物学中的一个重要问题。虽然在蛋白质单体方面取得了重大进展，但准确评估蛋白质复合物仍然具有挑战性。CASP 中的现有评估方法缺乏评估复合物的专用指标。DockQ 是一种广泛使用的指标，但也存在一些局限性。在本研究中，我们提出了一种名为 BDM（基于距离差矩阵）的新指标，用于评估蛋白质复合物预测结构。我们的方法利用通过比较真实和预测的蛋白质结构得出的距离差矩阵，与均方根偏差（RMSD）建立线性相关。BDM 克服了与受体-配体区分相关的限制，并消除了结构对齐的要求，使其成为一种更有效、更高效的指标。使用 CASP14 和 CASP15 测试集对 BDM 进行的评估表明，它的性能优于 CASP 官方评分。BDM 能对预测的蛋白质复合物进行准确合理的评估，广泛采用 BDM 有可能推动蛋白质复合物结构预测的发展，促进各科学领域的相关研究。代码见 http://mialab.ruc.edu.cn/BDMServer/ 。

{"title":"BDM: An Assessment Metric for Protein Complex Structure Models Based on Distance Difference Matrix.","authors":"Jiaqi Zhai, Wenda Wang, Ranxi Zhao, Daiwen Sun, Da Lu, Xinqi Gong","doi":"10.1007/s12539-024-00622-1","DOIUrl":"10.1007/s12539-024-00622-1","url":null,"abstract":"Protein complex structure prediction is an important problem in computational biology. While significant progress has been made for protein monomers, accurate evaluation of protein complexes remains challenging. Existing assessment methods in CASP, lack dedicated metrics for evaluating complexes. DockQ, a widely used metric, has some limitations. In this study, we propose a novel metric called BDM (Based on Distance difference Matrix) for assessing protein complex prediction structures. Our approach utilizes a distance difference matrix derived from comparing real and predicted protein structures, establishing a linear correlation with Root Mean Square Deviation (RMSD). BDM overcomes limitations associated with receptor-ligand differentiation and eliminates the requirement for structure alignment, making it a more effective and efficient metric. Evaluation of BDM using CASP14 and CASP15 test sets demonstrates superior performance compared to the official CASP scoring. BDM provides accurate and reasonable assessments of predicted protein complexes, wide adoption of BDM has the potential to advance protein complex structure prediction and facilitate related researches across scientific domains. Code is available at http://mialab.ruc.edu.cn/BDMServer/ .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"677-687"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140305533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network. 基于多相似性融合和负样本选择的卷积神经网络识别蛋白质磷酸化位点与疾病的联系

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-09-01 Epub Date: 2024-03-08 DOI: 10.1007/s12539-024-00615-0

Qian Deng, Jing Zhang, Jie Liu, Yuqi Liu, Zong Dai, Xiaoyong Zou, Zhanchao Li

As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.

作为最重要的翻译后修饰（PTMs）之一，蛋白质磷酸化在多种生物过程中发挥着关键作用。许多研究表明，蛋白质磷酸化与人类的各种疾病有关。因此，确定蛋白质磷酸化位点与疾病的关联有助于阐明疾病的发病机制和发现新的药物靶点。我们为磷酸化位点构建了序列相似性网络和高斯相互作用图谱核相似性网络，并为疾病构建了疾病语义相似性网络、疾病症状相似性网络和高斯相互作用图谱核相似性网络。为了有效结合不同的磷酸化位点和疾病相似性信息，采用了带重启的随机游走算法来获取网络的拓扑信息。然后，利用扩散成分分析方法获得磷酸化位点相似性和疾病相似性的综合信息。同时，根据欧氏距离法筛选出可靠的阴性样本。最后，构建了一个卷积神经网络（CNN）模型来识别磷酸化位点与疾病之间的潜在关联。在十倍交叉验证的基础上，得到的评价指标包括：准确率为93.48%，特异性为96.82%，灵敏度为90.15%，精确度为96.62%，马修相关系数为0.8719，接收者工作特征曲线下面积为0.9786，精确度-召回曲线下面积为0.9836。此外，预测的前 20 个与疾病相关的磷酸化位点（阿尔茨海默病为 19/20；神经母细胞瘤为 20/16）中的大部分都得到了文献和数据库的验证。这些结果表明，所提出的方法具有出色的预测性能和较高的实用价值。

{"title":"Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network.","authors":"Qian Deng, Jing Zhang, Jie Liu, Yuqi Liu, Zong Dai, Xiaoyong Zou, Zhanchao Li","doi":"10.1007/s12539-024-00615-0","DOIUrl":"10.1007/s12539-024-00615-0","url":null,"abstract":"As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"649-664"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140059304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics. SeFilter-DIA：用于过滤数据独立获取蛋白质组学高置信度肽段的挤压-激发网络。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-09-01 Epub Date: 2024-03-12 DOI: 10.1007/s12539-024-00611-4

Qingzu He, Huan Guo, Yulin Li, Guoqiang He, Xiang Li, Jianwei Shuai

Mass spectrometry is crucial in proteomics analysis, particularly using Data Independent Acquisition (DIA) for reliable and reproducible mass spectrometry data acquisition, enabling broad mass-to-charge ratio coverage and high throughput. DIA-NN, a prominent deep learning software in DIA proteome analysis, generates peptide results but may include low-confidence peptides. Conventionally, biologists have to manually screen peptide fragment ion chromatogram peaks (XIC) for identifying high-confidence peptides, a time-consuming and subjective process prone to variability. In this study, we introduce SeFilter-DIA, a deep learning algorithm, aiming at automating the identification of high-confidence peptides. Leveraging compressed excitation neural network and residual network models, SeFilter-DIA extracts XIC features and effectively discerns between high and low-confidence peptides. Evaluation of the benchmark datasets demonstrates SeFilter-DIA achieving 99.6% AUC on the test set and 97% for other performance indicators. Furthermore, SeFilter-DIA is applicable for screening peptides with phosphorylation modifications. These results demonstrate the potential of SeFilter-DIA to replace manual screening, providing an efficient and objective approach for high-confidence peptide identification while mitigating associated limitations.

质谱技术在蛋白质组学分析中至关重要，尤其是利用数据独立采集（DIA）技术进行可靠、可重复的质谱数据采集，可实现广泛的质荷比覆盖和高通量。DIA-NN 是 DIA 蛋白质组分析中一款著名的深度学习软件，可生成肽段结果，但可能包含低置信度肽段。传统上，生物学家必须手动筛选肽片段离子色谱峰（XIC）以识别高置信度肽段，这是一个耗时且主观易变的过程。在本研究中，我们引入了一种深度学习算法 SeFilter-DIA，旨在自动识别高置信度肽段。利用压缩激励神经网络和残差网络模型，SeFilter-DIA 可提取 XIC 特征并有效区分高可信度肽段和低可信度肽段。对基准数据集的评估表明，SeFilter-DIA 的测试集 AUC 达到 99.6%，其他性能指标达到 97%。此外，SeFilter-DIA 还适用于筛选具有磷酸化修饰的多肽。这些结果证明了 SeFilter-DIA 取代人工筛选的潜力，为高置信度多肽鉴定提供了一种高效、客观的方法，同时减轻了相关的局限性。

{"title":"SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics.","authors":"Qingzu He, Huan Guo, Yulin Li, Guoqiang He, Xiang Li, Jianwei Shuai","doi":"10.1007/s12539-024-00611-4","DOIUrl":"10.1007/s12539-024-00611-4","url":null,"abstract":"Mass spectrometry is crucial in proteomics analysis, particularly using Data Independent Acquisition (DIA) for reliable and reproducible mass spectrometry data acquisition, enabling broad mass-to-charge ratio coverage and high throughput. DIA-NN, a prominent deep learning software in DIA proteome analysis, generates peptide results but may include low-confidence peptides. Conventionally, biologists have to manually screen peptide fragment ion chromatogram peaks (XIC) for identifying high-confidence peptides, a time-consuming and subjective process prone to variability. In this study, we introduce SeFilter-DIA, a deep learning algorithm, aiming at automating the identification of high-confidence peptides. Leveraging compressed excitation neural network and residual network models, SeFilter-DIA extracts XIC features and effectively discerns between high and low-confidence peptides. Evaluation of the benchmark datasets demonstrates SeFilter-DIA achieving 99.6% AUC on the test set and 97% for other performance indicators. Furthermore, SeFilter-DIA is applicable for screening peptides with phosphorylation modifications. These results demonstrate the potential of SeFilter-DIA to replace manual screening, providing an efficient and objective approach for high-confidence peptide identification while mitigating associated limitations.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"579-592"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140109963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CHL-DTI: A Novel High-Low Order Information Convergence Framework for Effective Drug-Target Interaction Prediction. CHL-DTI：用于有效药物-靶点相互作用预测的新型高低阶信息收敛框架。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-09-01 Epub Date: 2024-03-14 DOI: 10.1007/s12539-024-00608-z

Shudong Wang, Yingye Liu, Yuanyuan Zhang, Kuijie Zhang, Xuanmo Song, Yu Zhang, Shanchen Pang

Recognizing drug-target interactions (DTI) stands as a pivotal element in the expansive field of drug discovery. Traditional biological wet experiments, although valuable, are time-consuming and costly as methods. Recently, computational methods grounded in network learning have demonstrated great advantages by effective topological feature extraction and attracted extensive research attention. However, most existing network-based learning methods only consider the low-order binary correlation between individual drug and target, neglecting the potential higher-order correlation information derived from multiple drugs and targets. High-order information, as an essential component, exhibits complementarity with low-order information. Hence, the incorporation of higher-order associations between drugs and targets, while adequately integrating them with the existing lower-order information, could potentially yield substantial breakthroughs in predicting drug-target interactions. We propose a novel dual channels network-based learning model CHL-DTI that converges high-order information from hypergraphs and low-order information from ordinary graph for drug-target interaction prediction. The convergence of high-low order information in CHL-DTI is manifested in two key aspects. First, during the feature extraction stage, the model integrates both high-level semantic information and low-level topological information by combining hypergraphs and ordinary graph. Second, CHL-DTI fully fuse the innovative introduced drug-protein pairs (DPP) hypergraph network structure with ordinary topological network structure information. Extensive experimentation conducted on three public datasets showcases the superior performance of CHL-DTI in DTI prediction tasks when compared to SOTA methods. The source code of CHL-DTI is available at https://github.com/UPCLyy/CHL-DTI .

认识药物与靶点的相互作用（DTI）是药物发现这一广阔领域的关键因素。传统的生物湿法实验虽然很有价值，但耗时长、成本高。近年来，以网络学习为基础的计算方法通过有效的拓扑特征提取展现出了巨大的优势，引起了广泛的研究关注。然而，现有的基于网络的学习方法大多只考虑单个药物和靶点之间的低阶二元相关性，而忽略了从多种药物和靶点中获得的潜在高阶相关信息。高阶信息作为重要组成部分，与低阶信息具有互补性。因此，将药物与靶点之间的高阶关联信息与现有的低阶信息充分整合，有可能在预测药物与靶点相互作用方面取得重大突破。我们提出了一种新颖的基于双通道网络的学习模型 CHL-DTI，它将超图中的高阶信息和普通图中的低阶信息融合在一起，用于药物-靶点相互作用预测。CHL-DTI 的高低阶信息收敛主要体现在两个方面。首先，在特征提取阶段，该模型通过结合超图和普通图，整合了高层语义信息和低层拓扑信息。其次，CHL-DTI 将创新性引入的药物-蛋白配对（DPP）超图网络结构与普通拓扑网络结构信息充分融合。在三个公开数据集上进行的大量实验表明，与 SOTA 方法相比，CHL-DTI 在 DTI 预测任务中表现出更优越的性能。CHL-DTI 的源代码可在 https://github.com/UPCLyy/CHL-DTI 上获取。

{"title":"CHL-DTI: A Novel High-Low Order Information Convergence Framework for Effective Drug-Target Interaction Prediction.","authors":"Shudong Wang, Yingye Liu, Yuanyuan Zhang, Kuijie Zhang, Xuanmo Song, Yu Zhang, Shanchen Pang","doi":"10.1007/s12539-024-00608-z","DOIUrl":"10.1007/s12539-024-00608-z","url":null,"abstract":"Recognizing drug-target interactions (DTI) stands as a pivotal element in the expansive field of drug discovery. Traditional biological wet experiments, although valuable, are time-consuming and costly as methods. Recently, computational methods grounded in network learning have demonstrated great advantages by effective topological feature extraction and attracted extensive research attention. However, most existing network-based learning methods only consider the low-order binary correlation between individual drug and target, neglecting the potential higher-order correlation information derived from multiple drugs and targets. High-order information, as an essential component, exhibits complementarity with low-order information. Hence, the incorporation of higher-order associations between drugs and targets, while adequately integrating them with the existing lower-order information, could potentially yield substantial breakthroughs in predicting drug-target interactions. We propose a novel dual channels network-based learning model CHL-DTI that converges high-order information from hypergraphs and low-order information from ordinary graph for drug-target interaction prediction. The convergence of high-low order information in CHL-DTI is manifested in two key aspects. First, during the feature extraction stage, the model integrates both high-level semantic information and low-level topological information by combining hypergraphs and ordinary graph. Second, CHL-DTI fully fuse the innovative introduced drug-protein pairs (DPP) hypergraph network structure with ordinary topological network structure information. Extensive experimentation conducted on three public datasets showcases the superior performance of CHL-DTI in DTI prediction tasks when compared to SOTA methods. The source code of CHL-DTI is available at https://github.com/UPCLyy/CHL-DTI .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"568-578"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140131318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Singular Value Decomposition-Driven Non-negative Matrix Factorization with Application to Identify the Association Patterns of Sarcoma Recurrence. 奇异值分解驱动的非负矩阵因式分解在肉瘤复发关联模式识别中的应用

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2024-09-01 Epub Date: 2024-03-01 DOI: 10.1007/s12539-024-00606-1

Jin Deng, Kaijun Li, Wei Luo

Sarcomas are malignant tumors from mesenchymal tissue and are characterized by their complexity and diversity. The high recurrence rate making it important to understand the mechanisms behind their recurrence and to develop personalized treatments and drugs. However, previous studies on the association patterns of multi-modal data on sarcoma recurrence have overlooked the fact that genes do not act independently, but rather function within signaling pathways. Therefore, this study collected 290 whole solid images, 869 gene and 1387 pathway data of over 260 sarcoma samples from UCSC and TCGA to identify the association patterns of gene-pathway-cell related to sarcoma recurrences. Meanwhile, considering that most multi-modal data fusion methods based on the joint non-negative matrix factorization (NMF) model led to poor experimental repeatability due to random initialization of factorization parameters, the study proposed the singular value decomposition (SVD)-driven joint NMF model by applying the SVD method to calculate initialized weight and coefficient matrices to achieve the reproducibility of the results. The results of the experimental comparison indicated that the SVD algorithm enhances the performance of the joint NMF algorithm. Furthermore, the representative module indicated a significant relationship between genes in pathways and image features. Multi-level analysis provided valuable insights into the connections between biological processes, cellular features, and sarcoma recurrence. In addition, potential biomarkers were uncovered, while various mechanisms of sarcoma recurrence were identified from an imaging genetic perspective. Overall, the SVD-NMF model affords a novel perspective on combining multi-omics data to explore the association related to sarcoma recurrence.

肉瘤是来自间质组织的恶性肿瘤，具有复杂性和多样性的特点。肉瘤的复发率很高，因此了解其复发背后的机制以及开发个性化治疗方法和药物非常重要。然而，以往关于肉瘤复发多模态数据关联模式的研究忽略了一个事实，即基因并非独立作用，而是在信号通路中发挥作用。因此，本研究从UCSC和TCGA收集了260多个肉瘤样本的290张全实体图像、869个基因和1387个通路数据，以确定与肉瘤复发相关的基因-通路-细胞的关联模式。同时，考虑到大多数基于联合非负矩阵因式分解（NMF）模型的多模态数据融合方法由于因式分解参数的随机初始化导致实验可重复性差，该研究提出了奇异值分解（SVD）驱动的联合NMF模型，通过应用SVD方法计算初始化的权重矩阵和系数矩阵来实现结果的可重复性。实验对比结果表明，SVD 算法提高了联合 NMF 算法的性能。此外，代表性模块表明，通路中的基因与图像特征之间存在显著关系。多层次分析为生物过程、细胞特征和肉瘤复发之间的联系提供了有价值的见解。此外，还发现了潜在的生物标记物，并从成像基因的角度确定了肉瘤复发的各种机制。总之，SVD-NMF 模型为结合多组学数据探索肉瘤复发的相关性提供了一个新的视角。

{"title":"Singular Value Decomposition-Driven Non-negative Matrix Factorization with Application to Identify the Association Patterns of Sarcoma Recurrence.","authors":"Jin Deng, Kaijun Li, Wei Luo","doi":"10.1007/s12539-024-00606-1","DOIUrl":"10.1007/s12539-024-00606-1","url":null,"abstract":"Sarcomas are malignant tumors from mesenchymal tissue and are characterized by their complexity and diversity. The high recurrence rate making it important to understand the mechanisms behind their recurrence and to develop personalized treatments and drugs. However, previous studies on the association patterns of multi-modal data on sarcoma recurrence have overlooked the fact that genes do not act independently, but rather function within signaling pathways. Therefore, this study collected 290 whole solid images, 869 gene and 1387 pathway data of over 260 sarcoma samples from UCSC and TCGA to identify the association patterns of gene-pathway-cell related to sarcoma recurrences. Meanwhile, considering that most multi-modal data fusion methods based on the joint non-negative matrix factorization (NMF) model led to poor experimental repeatability due to random initialization of factorization parameters, the study proposed the singular value decomposition (SVD)-driven joint NMF model by applying the SVD method to calculate initialized weight and coefficient matrices to achieve the reproducibility of the results. The results of the experimental comparison indicated that the SVD algorithm enhances the performance of the joint NMF algorithm. Furthermore, the representative module indicated a significant relationship between genes in pathways and image features. Multi-level analysis provided valuable insights into the connections between biological processes, cellular features, and sarcoma recurrence. In addition, potential biomarkers were uncovered, while various mechanisms of sarcoma recurrence were identified from an imaging genetic perspective. Overall, the SVD-NMF model affords a novel perspective on combining multi-omics data to explore the association related to sarcoma recurrence.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"554-567"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139996217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0