首页 > 最新文献

Interdisciplinary Sciences: Computational Life Sciences最新文献

英文 中文
DOTAD: A Database of Therapeutic Antibody Developability. DOTAD:治疗性抗体可开发性数据库。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-03-26 DOI: 10.1007/s12539-024-00613-2
Wenzhen Li, Hongyan Lin, Ziru Huang, Shiyang Xie, Yuwei Zhou, Rong Gong, Qianhu Jiang, ChangCheng Xiang, Jian Huang

The development of therapeutic antibodies is an important aspect of new drug discovery pipelines. The assessment of an antibody's developability-its suitability for large-scale production and therapeutic use-is a particularly important step in this process. Given that experimental assays to assess antibody developability in large scale are expensive and time-consuming, computational methods have been a more efficient alternative. However, the antibody research community faces significant challenges due to the scarcity of readily accessible data on antibody developability, which is essential for training and validating computational models. To address this gap, DOTAD (Database Of Therapeutic Antibody Developability) has been built as the first database dedicated exclusively to the curation of therapeutic antibody developability information. DOTAD aggregates all available therapeutic antibody sequence data along with various developability metrics from the scientific literature, offering researchers a robust platform for data storage, retrieval, exploration, and downloading. In addition to serving as a comprehensive repository, DOTAD enhances its utility by integrating a web-based interface that features state-of-the-art tools for the assessment of antibody developability. This ensures that users not only have access to critical data but also have the convenience of analyzing and interpreting this information. The DOTAD database represents a valuable resource for the scientific community, facilitating the advancement of therapeutic antibody research. It is freely accessible at http://i.uestc.edu.cn/DOTAD/ , providing an open data platform that supports the continuous growth and evolution of computational methods in the field of antibody development.

治疗性抗体的开发是新药研发管道的一个重要方面。在这一过程中,评估抗体的可开发性--其是否适合大规模生产和治疗用途--是尤为重要的一步。鉴于大规模评估抗体可开发性的实验检测既昂贵又耗时,计算方法成为了更有效的替代方法。然而,抗体研究界面临着巨大的挑战,因为缺乏可随时获取的抗体可开发性数据,而这些数据对于训练和验证计算模型至关重要。为了填补这一空白,我们建立了 DOTAD(治疗性抗体可发展性数据库),这是第一个专门用于整理治疗性抗体可发展性信息的数据库。DOTAD 汇集了所有可用的治疗性抗体序列数据以及科学文献中的各种可开发性指标,为研究人员提供了一个强大的数据存储、检索、探索和下载平台。DOTAD 除了作为一个综合资料库外,还通过集成一个基于网络的界面来增强其实用性,该界面具有最先进的抗体可开发性评估工具。这确保了用户不仅能访问关键数据,还能方便地分析和解读这些信息。DOTAD 数据库是科学界的宝贵资源,促进了治疗性抗体研究的发展。该数据库可在 http://i.uestc.edu.cn/DOTAD/ 免费访问,它提供了一个开放的数据平台,支持抗体开发领域计算方法的不断发展和演变。
{"title":"DOTAD: A Database of Therapeutic Antibody Developability.","authors":"Wenzhen Li, Hongyan Lin, Ziru Huang, Shiyang Xie, Yuwei Zhou, Rong Gong, Qianhu Jiang, ChangCheng Xiang, Jian Huang","doi":"10.1007/s12539-024-00613-2","DOIUrl":"10.1007/s12539-024-00613-2","url":null,"abstract":"<p><p>The development of therapeutic antibodies is an important aspect of new drug discovery pipelines. The assessment of an antibody's developability-its suitability for large-scale production and therapeutic use-is a particularly important step in this process. Given that experimental assays to assess antibody developability in large scale are expensive and time-consuming, computational methods have been a more efficient alternative. However, the antibody research community faces significant challenges due to the scarcity of readily accessible data on antibody developability, which is essential for training and validating computational models. To address this gap, DOTAD (Database Of Therapeutic Antibody Developability) has been built as the first database dedicated exclusively to the curation of therapeutic antibody developability information. DOTAD aggregates all available therapeutic antibody sequence data along with various developability metrics from the scientific literature, offering researchers a robust platform for data storage, retrieval, exploration, and downloading. In addition to serving as a comprehensive repository, DOTAD enhances its utility by integrating a web-based interface that features state-of-the-art tools for the assessment of antibody developability. This ensures that users not only have access to critical data but also have the convenience of analyzing and interpreting this information. The DOTAD database represents a valuable resource for the scientific community, facilitating the advancement of therapeutic antibody research. It is freely accessible at http://i.uestc.edu.cn/DOTAD/ , providing an open data platform that supports the continuous growth and evolution of computational methods in the field of antibody development.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"623-634"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140293414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BDM: An Assessment Metric for Protein Complex Structure Models Based on Distance Difference Matrix. BDM:基于距离差矩阵的蛋白质复合结构模型评估指标。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-03-27 DOI: 10.1007/s12539-024-00622-1
Jiaqi Zhai, Wenda Wang, Ranxi Zhao, Daiwen Sun, Da Lu, Xinqi Gong

Protein complex structure prediction is an important problem in computational biology. While significant progress has been made for protein monomers, accurate evaluation of protein complexes remains challenging. Existing assessment methods in CASP, lack dedicated metrics for evaluating complexes. DockQ, a widely used metric, has some limitations. In this study, we propose a novel metric called BDM (Based on Distance difference Matrix) for assessing protein complex prediction structures. Our approach utilizes a distance difference matrix derived from comparing real and predicted protein structures, establishing a linear correlation with Root Mean Square Deviation (RMSD). BDM overcomes limitations associated with receptor-ligand differentiation and eliminates the requirement for structure alignment, making it a more effective and efficient metric. Evaluation of BDM using CASP14 and CASP15 test sets demonstrates superior performance compared to the official CASP scoring. BDM provides accurate and reasonable assessments of predicted protein complexes, wide adoption of BDM has the potential to advance protein complex structure prediction and facilitate related researches across scientific domains. Code is available at http://mialab.ruc.edu.cn/BDMServer/ .

蛋白质复合体结构预测是计算生物学中的一个重要问题。虽然在蛋白质单体方面取得了重大进展,但准确评估蛋白质复合物仍然具有挑战性。CASP 中的现有评估方法缺乏评估复合物的专用指标。DockQ 是一种广泛使用的指标,但也存在一些局限性。在本研究中,我们提出了一种名为 BDM(基于距离差矩阵)的新指标,用于评估蛋白质复合物预测结构。我们的方法利用通过比较真实和预测的蛋白质结构得出的距离差矩阵,与均方根偏差(RMSD)建立线性相关。BDM 克服了与受体-配体区分相关的限制,并消除了结构对齐的要求,使其成为一种更有效、更高效的指标。使用 CASP14 和 CASP15 测试集对 BDM 进行的评估表明,它的性能优于 CASP 官方评分。BDM 能对预测的蛋白质复合物进行准确合理的评估,广泛采用 BDM 有可能推动蛋白质复合物结构预测的发展,促进各科学领域的相关研究。代码见 http://mialab.ruc.edu.cn/BDMServer/ 。
{"title":"BDM: An Assessment Metric for Protein Complex Structure Models Based on Distance Difference Matrix.","authors":"Jiaqi Zhai, Wenda Wang, Ranxi Zhao, Daiwen Sun, Da Lu, Xinqi Gong","doi":"10.1007/s12539-024-00622-1","DOIUrl":"10.1007/s12539-024-00622-1","url":null,"abstract":"<p><p>Protein complex structure prediction is an important problem in computational biology. While significant progress has been made for protein monomers, accurate evaluation of protein complexes remains challenging. Existing assessment methods in CASP, lack dedicated metrics for evaluating complexes. DockQ, a widely used metric, has some limitations. In this study, we propose a novel metric called BDM (Based on Distance difference Matrix) for assessing protein complex prediction structures. Our approach utilizes a distance difference matrix derived from comparing real and predicted protein structures, establishing a linear correlation with Root Mean Square Deviation (RMSD). BDM overcomes limitations associated with receptor-ligand differentiation and eliminates the requirement for structure alignment, making it a more effective and efficient metric. Evaluation of BDM using CASP14 and CASP15 test sets demonstrates superior performance compared to the official CASP scoring. BDM provides accurate and reasonable assessments of predicted protein complexes, wide adoption of BDM has the potential to advance protein complex structure prediction and facilitate related researches across scientific domains. Code is available at http://mialab.ruc.edu.cn/BDMServer/ .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"677-687"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140305533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network. 基于多相似性融合和负样本选择的卷积神经网络识别蛋白质磷酸化位点与疾病的联系
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-03-08 DOI: 10.1007/s12539-024-00615-0
Qian Deng, Jing Zhang, Jie Liu, Yuqi Liu, Zong Dai, Xiaoyong Zou, Zhanchao Li

As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.

作为最重要的翻译后修饰(PTMs)之一,蛋白质磷酸化在多种生物过程中发挥着关键作用。许多研究表明,蛋白质磷酸化与人类的各种疾病有关。因此,确定蛋白质磷酸化位点与疾病的关联有助于阐明疾病的发病机制和发现新的药物靶点。我们为磷酸化位点构建了序列相似性网络和高斯相互作用图谱核相似性网络,并为疾病构建了疾病语义相似性网络、疾病症状相似性网络和高斯相互作用图谱核相似性网络。为了有效结合不同的磷酸化位点和疾病相似性信息,采用了带重启的随机游走算法来获取网络的拓扑信息。然后,利用扩散成分分析方法获得磷酸化位点相似性和疾病相似性的综合信息。同时,根据欧氏距离法筛选出可靠的阴性样本。最后,构建了一个卷积神经网络(CNN)模型来识别磷酸化位点与疾病之间的潜在关联。在十倍交叉验证的基础上,得到的评价指标包括:准确率为93.48%,特异性为96.82%,灵敏度为90.15%,精确度为96.62%,马修相关系数为0.8719,接收者工作特征曲线下面积为0.9786,精确度-召回曲线下面积为0.9836。此外,预测的前 20 个与疾病相关的磷酸化位点(阿尔茨海默病为 19/20;神经母细胞瘤为 20/16)中的大部分都得到了文献和数据库的验证。这些结果表明,所提出的方法具有出色的预测性能和较高的实用价值。
{"title":"Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network.","authors":"Qian Deng, Jing Zhang, Jie Liu, Yuqi Liu, Zong Dai, Xiaoyong Zou, Zhanchao Li","doi":"10.1007/s12539-024-00615-0","DOIUrl":"10.1007/s12539-024-00615-0","url":null,"abstract":"<p><p>As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"649-664"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140059304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics. SeFilter-DIA:用于过滤数据独立获取蛋白质组学高置信度肽段的挤压-激发网络。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-03-12 DOI: 10.1007/s12539-024-00611-4
Qingzu He, Huan Guo, Yulin Li, Guoqiang He, Xiang Li, Jianwei Shuai

Mass spectrometry is crucial in proteomics analysis, particularly using Data Independent Acquisition (DIA) for reliable and reproducible mass spectrometry data acquisition, enabling broad mass-to-charge ratio coverage and high throughput. DIA-NN, a prominent deep learning software in DIA proteome analysis, generates peptide results but may include low-confidence peptides. Conventionally, biologists have to manually screen peptide fragment ion chromatogram peaks (XIC) for identifying high-confidence peptides, a time-consuming and subjective process prone to variability. In this study, we introduce SeFilter-DIA, a deep learning algorithm, aiming at automating the identification of high-confidence peptides. Leveraging compressed excitation neural network and residual network models, SeFilter-DIA extracts XIC features and effectively discerns between high and low-confidence peptides. Evaluation of the benchmark datasets demonstrates SeFilter-DIA achieving 99.6% AUC on the test set and 97% for other performance indicators. Furthermore, SeFilter-DIA is applicable for screening peptides with phosphorylation modifications. These results demonstrate the potential of SeFilter-DIA to replace manual screening, providing an efficient and objective approach for high-confidence peptide identification while mitigating associated limitations.

质谱技术在蛋白质组学分析中至关重要,尤其是利用数据独立采集(DIA)技术进行可靠、可重复的质谱数据采集,可实现广泛的质荷比覆盖和高通量。DIA-NN 是 DIA 蛋白质组分析中一款著名的深度学习软件,可生成肽段结果,但可能包含低置信度肽段。传统上,生物学家必须手动筛选肽片段离子色谱峰(XIC)以识别高置信度肽段,这是一个耗时且主观易变的过程。在本研究中,我们引入了一种深度学习算法 SeFilter-DIA,旨在自动识别高置信度肽段。利用压缩激励神经网络和残差网络模型,SeFilter-DIA 可提取 XIC 特征并有效区分高可信度肽段和低可信度肽段。对基准数据集的评估表明,SeFilter-DIA 的测试集 AUC 达到 99.6%,其他性能指标达到 97%。此外,SeFilter-DIA 还适用于筛选具有磷酸化修饰的多肽。这些结果证明了 SeFilter-DIA 取代人工筛选的潜力,为高置信度多肽鉴定提供了一种高效、客观的方法,同时减轻了相关的局限性。
{"title":"SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics.","authors":"Qingzu He, Huan Guo, Yulin Li, Guoqiang He, Xiang Li, Jianwei Shuai","doi":"10.1007/s12539-024-00611-4","DOIUrl":"10.1007/s12539-024-00611-4","url":null,"abstract":"<p><p>Mass spectrometry is crucial in proteomics analysis, particularly using Data Independent Acquisition (DIA) for reliable and reproducible mass spectrometry data acquisition, enabling broad mass-to-charge ratio coverage and high throughput. DIA-NN, a prominent deep learning software in DIA proteome analysis, generates peptide results but may include low-confidence peptides. Conventionally, biologists have to manually screen peptide fragment ion chromatogram peaks (XIC) for identifying high-confidence peptides, a time-consuming and subjective process prone to variability. In this study, we introduce SeFilter-DIA, a deep learning algorithm, aiming at automating the identification of high-confidence peptides. Leveraging compressed excitation neural network and residual network models, SeFilter-DIA extracts XIC features and effectively discerns between high and low-confidence peptides. Evaluation of the benchmark datasets demonstrates SeFilter-DIA achieving 99.6% AUC on the test set and 97% for other performance indicators. Furthermore, SeFilter-DIA is applicable for screening peptides with phosphorylation modifications. These results demonstrate the potential of SeFilter-DIA to replace manual screening, providing an efficient and objective approach for high-confidence peptide identification while mitigating associated limitations.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"579-592"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140109963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CHL-DTI: A Novel High-Low Order Information Convergence Framework for Effective Drug-Target Interaction Prediction. CHL-DTI:用于有效药物-靶点相互作用预测的新型高低阶信息收敛框架。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-03-14 DOI: 10.1007/s12539-024-00608-z
Shudong Wang, Yingye Liu, Yuanyuan Zhang, Kuijie Zhang, Xuanmo Song, Yu Zhang, Shanchen Pang

Recognizing drug-target interactions (DTI) stands as a pivotal element in the expansive field of drug discovery. Traditional biological wet experiments, although valuable, are time-consuming and costly as methods. Recently, computational methods grounded in network learning have demonstrated great advantages by effective topological feature extraction and attracted extensive research attention. However, most existing network-based learning methods only consider the low-order binary correlation between individual drug and target, neglecting the potential higher-order correlation information derived from multiple drugs and targets. High-order information, as an essential component, exhibits complementarity with low-order information. Hence, the incorporation of higher-order associations between drugs and targets, while adequately integrating them with the existing lower-order information, could potentially yield substantial breakthroughs in predicting drug-target interactions. We propose a novel dual channels network-based learning model CHL-DTI that converges high-order information from hypergraphs and low-order information from ordinary graph for drug-target interaction prediction. The convergence of high-low order information in CHL-DTI is manifested in two key aspects. First, during the feature extraction stage, the model integrates both high-level semantic information and low-level topological information by combining hypergraphs and ordinary graph. Second, CHL-DTI fully fuse the innovative introduced drug-protein pairs (DPP) hypergraph network structure with ordinary topological network structure information. Extensive experimentation conducted on three public datasets showcases the superior performance of CHL-DTI in DTI prediction tasks when compared to SOTA methods. The source code of CHL-DTI is available at https://github.com/UPCLyy/CHL-DTI .

认识药物与靶点的相互作用(DTI)是药物发现这一广阔领域的关键因素。传统的生物湿法实验虽然很有价值,但耗时长、成本高。近年来,以网络学习为基础的计算方法通过有效的拓扑特征提取展现出了巨大的优势,引起了广泛的研究关注。然而,现有的基于网络的学习方法大多只考虑单个药物和靶点之间的低阶二元相关性,而忽略了从多种药物和靶点中获得的潜在高阶相关信息。高阶信息作为重要组成部分,与低阶信息具有互补性。因此,将药物与靶点之间的高阶关联信息与现有的低阶信息充分整合,有可能在预测药物与靶点相互作用方面取得重大突破。我们提出了一种新颖的基于双通道网络的学习模型 CHL-DTI,它将超图中的高阶信息和普通图中的低阶信息融合在一起,用于药物-靶点相互作用预测。CHL-DTI 的高低阶信息收敛主要体现在两个方面。首先,在特征提取阶段,该模型通过结合超图和普通图,整合了高层语义信息和低层拓扑信息。其次,CHL-DTI 将创新性引入的药物-蛋白配对(DPP)超图网络结构与普通拓扑网络结构信息充分融合。在三个公开数据集上进行的大量实验表明,与 SOTA 方法相比,CHL-DTI 在 DTI 预测任务中表现出更优越的性能。CHL-DTI 的源代码可在 https://github.com/UPCLyy/CHL-DTI 上获取。
{"title":"CHL-DTI: A Novel High-Low Order Information Convergence Framework for Effective Drug-Target Interaction Prediction.","authors":"Shudong Wang, Yingye Liu, Yuanyuan Zhang, Kuijie Zhang, Xuanmo Song, Yu Zhang, Shanchen Pang","doi":"10.1007/s12539-024-00608-z","DOIUrl":"10.1007/s12539-024-00608-z","url":null,"abstract":"<p><p>Recognizing drug-target interactions (DTI) stands as a pivotal element in the expansive field of drug discovery. Traditional biological wet experiments, although valuable, are time-consuming and costly as methods. Recently, computational methods grounded in network learning have demonstrated great advantages by effective topological feature extraction and attracted extensive research attention. However, most existing network-based learning methods only consider the low-order binary correlation between individual drug and target, neglecting the potential higher-order correlation information derived from multiple drugs and targets. High-order information, as an essential component, exhibits complementarity with low-order information. Hence, the incorporation of higher-order associations between drugs and targets, while adequately integrating them with the existing lower-order information, could potentially yield substantial breakthroughs in predicting drug-target interactions. We propose a novel dual channels network-based learning model CHL-DTI that converges high-order information from hypergraphs and low-order information from ordinary graph for drug-target interaction prediction. The convergence of high-low order information in CHL-DTI is manifested in two key aspects. First, during the feature extraction stage, the model integrates both high-level semantic information and low-level topological information by combining hypergraphs and ordinary graph. Second, CHL-DTI fully fuse the innovative introduced drug-protein pairs (DPP) hypergraph network structure with ordinary topological network structure information. Extensive experimentation conducted on three public datasets showcases the superior performance of CHL-DTI in DTI prediction tasks when compared to SOTA methods. The source code of CHL-DTI is available at https://github.com/UPCLyy/CHL-DTI .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"568-578"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140131318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Singular Value Decomposition-Driven Non-negative Matrix Factorization with Application to Identify the Association Patterns of Sarcoma Recurrence. 奇异值分解驱动的非负矩阵因式分解在肉瘤复发关联模式识别中的应用
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-03-01 DOI: 10.1007/s12539-024-00606-1
Jin Deng, Kaijun Li, Wei Luo

Sarcomas are malignant tumors from mesenchymal tissue and are characterized by their complexity and diversity. The high recurrence rate making it important to understand the mechanisms behind their recurrence and to develop personalized treatments and drugs. However, previous studies on the association patterns of multi-modal data on sarcoma recurrence have overlooked the fact that genes do not act independently, but rather function within signaling pathways. Therefore, this study collected 290 whole solid images, 869 gene and 1387 pathway data of over 260 sarcoma samples from UCSC and TCGA to identify the association patterns of gene-pathway-cell related to sarcoma recurrences. Meanwhile, considering that most multi-modal data fusion methods based on the joint non-negative matrix factorization (NMF) model led to poor experimental repeatability due to random initialization of factorization parameters, the study proposed the singular value decomposition (SVD)-driven joint NMF model by applying the SVD method to calculate initialized weight and coefficient matrices to achieve the reproducibility of the results. The results of the experimental comparison indicated that the SVD algorithm enhances the performance of the joint NMF algorithm. Furthermore, the representative module indicated a significant relationship between genes in pathways and image features. Multi-level analysis provided valuable insights into the connections between biological processes, cellular features, and sarcoma recurrence. In addition, potential biomarkers were uncovered, while various mechanisms of sarcoma recurrence were identified from an imaging genetic perspective. Overall, the SVD-NMF model affords a novel perspective on combining multi-omics data to explore the association related to sarcoma recurrence.

肉瘤是来自间质组织的恶性肿瘤,具有复杂性和多样性的特点。肉瘤的复发率很高,因此了解其复发背后的机制以及开发个性化治疗方法和药物非常重要。然而,以往关于肉瘤复发多模态数据关联模式的研究忽略了一个事实,即基因并非独立作用,而是在信号通路中发挥作用。因此,本研究从UCSC和TCGA收集了260多个肉瘤样本的290张全实体图像、869个基因和1387个通路数据,以确定与肉瘤复发相关的基因-通路-细胞的关联模式。同时,考虑到大多数基于联合非负矩阵因式分解(NMF)模型的多模态数据融合方法由于因式分解参数的随机初始化导致实验可重复性差,该研究提出了奇异值分解(SVD)驱动的联合NMF模型,通过应用SVD方法计算初始化的权重矩阵和系数矩阵来实现结果的可重复性。实验对比结果表明,SVD 算法提高了联合 NMF 算法的性能。此外,代表性模块表明,通路中的基因与图像特征之间存在显著关系。多层次分析为生物过程、细胞特征和肉瘤复发之间的联系提供了有价值的见解。此外,还发现了潜在的生物标记物,并从成像基因的角度确定了肉瘤复发的各种机制。总之,SVD-NMF 模型为结合多组学数据探索肉瘤复发的相关性提供了一个新的视角。
{"title":"Singular Value Decomposition-Driven Non-negative Matrix Factorization with Application to Identify the Association Patterns of Sarcoma Recurrence.","authors":"Jin Deng, Kaijun Li, Wei Luo","doi":"10.1007/s12539-024-00606-1","DOIUrl":"10.1007/s12539-024-00606-1","url":null,"abstract":"<p><p>Sarcomas are malignant tumors from mesenchymal tissue and are characterized by their complexity and diversity. The high recurrence rate making it important to understand the mechanisms behind their recurrence and to develop personalized treatments and drugs. However, previous studies on the association patterns of multi-modal data on sarcoma recurrence have overlooked the fact that genes do not act independently, but rather function within signaling pathways. Therefore, this study collected 290 whole solid images, 869 gene and 1387 pathway data of over 260 sarcoma samples from UCSC and TCGA to identify the association patterns of gene-pathway-cell related to sarcoma recurrences. Meanwhile, considering that most multi-modal data fusion methods based on the joint non-negative matrix factorization (NMF) model led to poor experimental repeatability due to random initialization of factorization parameters, the study proposed the singular value decomposition (SVD)-driven joint NMF model by applying the SVD method to calculate initialized weight and coefficient matrices to achieve the reproducibility of the results. The results of the experimental comparison indicated that the SVD algorithm enhances the performance of the joint NMF algorithm. Furthermore, the representative module indicated a significant relationship between genes in pathways and image features. Multi-level analysis provided valuable insights into the connections between biological processes, cellular features, and sarcoma recurrence. In addition, potential biomarkers were uncovered, while various mechanisms of sarcoma recurrence were identified from an imaging genetic perspective. Overall, the SVD-NMF model affords a novel perspective on combining multi-omics data to explore the association related to sarcoma recurrence.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"554-567"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139996217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-view Molecular Pre-training with Generative Contrastive Learning. 利用生成式对比学习进行多视角分子预训练。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-05-06 DOI: 10.1007/s12539-024-00632-z
Yunwu Liu, Ruisheng Zhang, Yongna Yuan, Jun Ma, Tongfeng Li, Zhixuan Yu

Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.

分子表征学习可以保留有意义的分子结构作为嵌入向量,这是分子特性预测的必要前提。然而,学习如何准确地表示分子仍然具有挑战性。以往以端到端方式学习分子表征的方法可能会造成信息丢失,同时忽略了对分子生成表征的利用。为了获得丰富的分子特征信息,预训练分子表征模型利用了不同的分子表征,以减少单一分子表征造成的信息损失。因此,我们提供了一种独特的多视图生成对比学习预训练模型--MVGC。我们的预训练框架专门获取了分子的三种基本特征表征知识,并将它们有效地整合到基准数据集上预测分子特性。七项分类任务和三项回归任务的综合实验表明,我们提出的 MVGC 模型超越了大多数最先进的方法。此外,我们还探索了 MVGC 模型学习具有化学意义的分子表征的潜力。
{"title":"A Multi-view Molecular Pre-training with Generative Contrastive Learning.","authors":"Yunwu Liu, Ruisheng Zhang, Yongna Yuan, Jun Ma, Tongfeng Li, Zhixuan Yu","doi":"10.1007/s12539-024-00632-z","DOIUrl":"10.1007/s12539-024-00632-z","url":null,"abstract":"<p><p>Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"741-754"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140865514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Detection Method for High-Order SNP Epistatic Interactions Based on Explicit-Encoding-Based Multitasking Harmony Search. 基于显式编码的多任务和谐搜索的高阶 SNP 表观相互作用的新型检测方法
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-07-02 DOI: 10.1007/s12539-024-00621-2
Shouheng Tuo, Jiewei Jiang

To elucidate the genetic basis of complex diseases, it is crucial to discover the single-nucleotide polymorphisms (SNPs) contributing to disease susceptibility. This is particularly challenging for high-order SNP epistatic interactions (HEIs), which exhibit small individual effects but potentially large joint effects. These interactions are difficult to detect due to the vast search space, encompassing billions of possible combinations, and the computational complexity of evaluating them. This study proposes a novel explicit-encoding-based multitasking harmony search algorithm (MTHS-EE-DHEI) specifically designed to address this challenge. The algorithm operates in three stages. First, a harmony search algorithm is employed, utilizing four lightweight evaluation functions, such as Bayesian network and entropy, to efficiently explore potential SNP combinations related to disease status. Second, a G-test statistical method is applied to filter out insignificant SNP combinations. Finally, two machine learning-based methods, multifactor dimensionality reduction (MDR) as well as random forest (RF), are employed to validate the classification performance of the remaining significant SNP combinations. This research aims to demonstrate the effectiveness of MTHS-EE-DHEI in identifying HEIs compared to existing methods, potentially providing valuable insights into the genetic architecture of complex diseases. The performance of MTHS-EE-DHEI was evaluated on twenty simulated disease datasets and three real-world datasets encompassing age-related macular degeneration (AMD), rheumatoid arthritis (RA), and breast cancer (BC). The results demonstrably indicate that MTHS-EE-DHEI outperforms four state-of-the-art algorithms in terms of both detection power and computational efficiency. The source code is available at https://github.com/shouhengtuo/MTHS-EE-DHEI.git .

要阐明复杂疾病的遗传基础,发现导致疾病易感性的单核苷酸多态性(SNPs)至关重要。这对于高阶 SNP 表观交互作用(HEIs)来说尤其具有挑战性,因为这种交互作用表现出较小的个体效应,但可能产生较大的联合效应。由于搜索空间巨大,包含数十亿种可能的组合,而且评估这些组合的计算复杂,因此很难检测到这些相互作用。本研究提出了一种基于显式编码的新型多任务和谐搜索算法(MTHS-EE-DHEI),专门用于解决这一难题。该算法分三个阶段运行。首先,采用和谐搜索算法,利用贝叶斯网络和熵等四种轻量级评估函数,有效探索与疾病状态相关的潜在 SNP 组合。其次,采用 G 检验统计方法过滤掉不重要的 SNP 组合。最后,采用多因素降维(MDR)和随机森林(RF)这两种基于机器学习的方法来验证剩余重要 SNP 组合的分类性能。这项研究旨在证明,与现有方法相比,MTHS-EE-DHEI 在识别 HEI 方面非常有效,有可能为复杂疾病的遗传结构提供有价值的见解。MTHS-EE-DHEI 的性能在二十个模拟疾病数据集和三个真实世界数据集上进行了评估,包括老年性黄斑变性(AMD)、类风湿性关节炎(RA)和乳腺癌(BC)。结果表明,MTHS-EE-DHEI 在检测能力和计算效率方面都优于四种最先进的算法。源代码见 https://github.com/shouhengtuo/MTHS-EE-DHEI.git 。
{"title":"A Novel Detection Method for High-Order SNP Epistatic Interactions Based on Explicit-Encoding-Based Multitasking Harmony Search.","authors":"Shouheng Tuo, Jiewei Jiang","doi":"10.1007/s12539-024-00621-2","DOIUrl":"10.1007/s12539-024-00621-2","url":null,"abstract":"<p><p>To elucidate the genetic basis of complex diseases, it is crucial to discover the single-nucleotide polymorphisms (SNPs) contributing to disease susceptibility. This is particularly challenging for high-order SNP epistatic interactions (HEIs), which exhibit small individual effects but potentially large joint effects. These interactions are difficult to detect due to the vast search space, encompassing billions of possible combinations, and the computational complexity of evaluating them. This study proposes a novel explicit-encoding-based multitasking harmony search algorithm (MTHS-EE-DHEI) specifically designed to address this challenge. The algorithm operates in three stages. First, a harmony search algorithm is employed, utilizing four lightweight evaluation functions, such as Bayesian network and entropy, to efficiently explore potential SNP combinations related to disease status. Second, a G-test statistical method is applied to filter out insignificant SNP combinations. Finally, two machine learning-based methods, multifactor dimensionality reduction (MDR) as well as random forest (RF), are employed to validate the classification performance of the remaining significant SNP combinations. This research aims to demonstrate the effectiveness of MTHS-EE-DHEI in identifying HEIs compared to existing methods, potentially providing valuable insights into the genetic architecture of complex diseases. The performance of MTHS-EE-DHEI was evaluated on twenty simulated disease datasets and three real-world datasets encompassing age-related macular degeneration (AMD), rheumatoid arthritis (RA), and breast cancer (BC). The results demonstrably indicate that MTHS-EE-DHEI outperforms four state-of-the-art algorithms in terms of both detection power and computational efficiency. The source code is available at https://github.com/shouhengtuo/MTHS-EE-DHEI.git .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"688-711"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141491788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction. GraphsformerCPI:用于化合物-蛋白质相互作用预测的图形转换器
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-03-08 DOI: 10.1007/s12539-024-00609-y
Jun Ma, Zhili Zhao, Tongfeng Li, Yunwu Liu, Jun Ma, Ruisheng Zhang

Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.

准确预测化合物与蛋白质的相互作用(CPI)是计算机辅助药物设计的一项关键任务。近年来,化合物活性和生物医学数据的指数级增长凸显了对高效且可解释的预测方法的需求。在本研究中,我们提出了一种端到端的深度学习框架 GraphsformerCPI,它可以提高预测性能和可解释性。GraphsformerCPI 将化合物和蛋白质视为具有空间结构的节点序列,并利用新颖的结构增强自注意机制来整合分子内的语义和图结构特征,从而实现深度分子表征。为了捕捉化合物原子和蛋白质残基之间的重要关联,我们设计了一种双重注意机制,通过.交叉映射有效提取关系特征。通过将 Transformers 强大的学习能力扩展到空间结构并广泛利用注意力机制,我们的模型具有很强的可解释性,这是与大多数黑盒深度学习方法相比的显著优势。为了评估 GraphsformerCPI,我们在基准数据集上进行了广泛的实验,包括人类数据集、线虫数据集、戴维斯数据集和 KIBA 数据集。我们探索了模型深度和辍学率对性能的影响,并将我们的模型与最先进的基线模型进行了比较。结果表明,GraphsformerCPI 在分类数据集上的表现优于基线模型,在回归数据集上的表现也很有竞争力。具体来说,在人类数据集上,GraphsformerCPI 的 AUC 平均提高了 1.6%,精确度提高了 0.5%,召回率提高了 5.3%。在 KIBA 数据集上,一致性指数(CI)和均方误差(MSE)分别平均提高了 3.3% 和 7.2%。分子对接结果表明,我们的模型提供了关于内在相互作用和结合机制的新见解。我们的研究在有效预测 CPI 和结合亲和力、识别关键原子和残基、提高模型可解释性等方面具有实际意义。
{"title":"GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction.","authors":"Jun Ma, Zhili Zhao, Tongfeng Li, Yunwu Liu, Jun Ma, Ruisheng Zhang","doi":"10.1007/s12539-024-00609-y","DOIUrl":"10.1007/s12539-024-00609-y","url":null,"abstract":"<p><p>Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"361-377"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140059303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GEnDDn: An lncRNA-Disease Association Identification Framework Based on Dual-Net Neural Architecture and Deep Neural Network. GEnDDn:基于双网神经架构和深度神经网络的 lncRNA-疾病关联识别框架
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-05-11 DOI: 10.1007/s12539-024-00619-w
Lihong Peng, Mengnan Ren, Liangliang Huang, Min Chen

Accumulating studies have demonstrated close relationships between long non-coding RNAs (lncRNAs) and diseases. Identification of new lncRNA-disease associations (LDAs) enables us to better understand disease mechanisms and further provides promising insights into cancer targeted therapy and anti-cancer drug design. Here, we present an LDA prediction framework called GEnDDn based on deep learning. GEnDDn mainly comprises two steps: First, features of both lncRNAs and diseases are extracted by combining similarity computation, non-negative matrix factorization, and graph attention auto-encoder, respectively. And each lncRNA-disease pair (LDP) is depicted as a vector based on concatenation operation on the extracted features. Subsequently, unknown LDPs are classified by aggregating dual-net neural architecture and deep neural network. Using six different evaluation metrics, we found that GEnDDn surpassed four competing LDA identification methods (SDLDA, LDNFSGB, IPCARF, LDASR) on the lncRNADisease and MNDR databases under fivefold cross-validation experiments on lncRNAs, diseases, LDPs, and independent lncRNAs and independent diseases, respectively. Ablation experiments further validated the powerful LDA prediction performance of GEnDDn. Furthermore, we utilized GEnDDn to find underlying lncRNAs for lung cancer and breast cancer. The results elucidated that there may be dense linkages between IFNG-AS1 and lung cancer as well as between HIF1A-AS1 and breast cancer. The results require further biomedical experimental verification. GEnDDn is publicly available at https://github.com/plhhnu/GEnDDn.

越来越多的研究表明,长非编码 RNA(lncRNA)与疾病之间存在密切关系。鉴定新的lncRNA-疾病关联(LDAs)能让我们更好地理解疾病机制,并进一步为癌症靶向治疗和抗癌药物设计提供有前景的见解。在此,我们提出一种基于深度学习的 LDA 预测框架,称为 GEnDDn。GEnDDn 主要包括两个步骤:首先,结合相似性计算、非负矩阵因式分解和图注意自动编码器,分别提取lncRNA和疾病的特征。然后根据提取的特征进行连接操作,将每对 lncRNA-疾病(LDP)描绘成一个向量。随后,通过聚合双网神经架构和深度神经网络对未知 LDP 进行分类。通过六种不同的评价指标,我们发现在lncRNADisease和MNDR数据库上,GEnDDn分别在lncRNAs、疾病、LDPs、独立lncRNAs和独立疾病的五倍交叉验证实验中超越了四种竞争性LDA识别方法(SDLDA、LDNFSGB、IPCARF、LDASR)。消融实验进一步验证了 GEnDDn 强大的 LDA 预测性能。此外,我们还利用GEnDDn找到了肺癌和乳腺癌的潜在lncRNA。结果表明,IFNG-AS1与肺癌以及HIF1A-AS1与乳腺癌之间可能存在紧密联系。这些结果还需要进一步的生物医学实验验证。GEnDDn可在https://github.com/plhhnu/GEnDDn 公开获取。
{"title":"GEnDDn: An lncRNA-Disease Association Identification Framework Based on Dual-Net Neural Architecture and Deep Neural Network.","authors":"Lihong Peng, Mengnan Ren, Liangliang Huang, Min Chen","doi":"10.1007/s12539-024-00619-w","DOIUrl":"10.1007/s12539-024-00619-w","url":null,"abstract":"<p><p>Accumulating studies have demonstrated close relationships between long non-coding RNAs (lncRNAs) and diseases. Identification of new lncRNA-disease associations (LDAs) enables us to better understand disease mechanisms and further provides promising insights into cancer targeted therapy and anti-cancer drug design. Here, we present an LDA prediction framework called GEnDDn based on deep learning. GEnDDn mainly comprises two steps: First, features of both lncRNAs and diseases are extracted by combining similarity computation, non-negative matrix factorization, and graph attention auto-encoder, respectively. And each lncRNA-disease pair (LDP) is depicted as a vector based on concatenation operation on the extracted features. Subsequently, unknown LDPs are classified by aggregating dual-net neural architecture and deep neural network. Using six different evaluation metrics, we found that GEnDDn surpassed four competing LDA identification methods (SDLDA, LDNFSGB, IPCARF, LDASR) on the lncRNADisease and MNDR databases under fivefold cross-validation experiments on lncRNAs, diseases, LDPs, and independent lncRNAs and independent diseases, respectively. Ablation experiments further validated the powerful LDA prediction performance of GEnDDn. Furthermore, we utilized GEnDDn to find underlying lncRNAs for lung cancer and breast cancer. The results elucidated that there may be dense linkages between IFNG-AS1 and lung cancer as well as between HIF1A-AS1 and breast cancer. The results require further biomedical experimental verification. GEnDDn is publicly available at https://github.com/plhhnu/GEnDDn.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"418-438"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140907796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Interdisciplinary Sciences: Computational Life Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1