首页 > 最新文献

Interdisciplinary Sciences: Computational Life Sciences最新文献

英文 中文
misORFPred: A Novel Method to Mine Translatable sORFs in Plant Pri-miRNAs Using Enhanced Scalable k-mer and Dynamic Ensemble Voting Strategy. misORFPred:使用增强型可扩展 k-mer 和动态组合投票策略挖掘植物 Pri-miRNA 中可翻译 sORF 的新方法。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-14 DOI: 10.1007/s12539-024-00661-8
Haibin Li, Jun Meng, Zhaowei Wang, Yushi Luan

The primary microRNAs (pri-miRNAs) have been observed to contain translatable small open reading frames (sORFs) that can encode peptides as an independent element. Relevant studies have proven that those of sORFs are of significance in regulating the expression of biological traits. The existing methods for predicting the coding potential of sORFs frequently overlook this data or categorize them as negative samples, impeding the identification of additional translatable sORFs in pri-miRNAs. In light of this, a novel method named misORFPred has been proposed. Specifically, an enhanced scalable k-mer (ESKmer) that simultaneously integrates the composition information within a sequence and distance information between sequences is designed to extract the nucleotide sequence features. After feature selection, the optimal features and several machine learning classifiers are combined to construct the ensemble model, where a newly devised dynamic ensemble voting strategy (DEVS) is proposed to dynamically adjust the weights of base classifiers and adaptively select the optimal base classifiers for each unlabeled sample. Cross-validation results suggest that ESKmer and DEVS are essential for this classification task and could boost model performance. Independent testing results indicate that misORFPred outperforms the state-of-the-art methods. Furthermore, we execute misORFPerd on the genomes of various plant species and perform a thorough analysis of the predicted outcomes. Taken together, misORFPred is a powerful tool for identifying the translatable sORFs in plant pri-miRNAs and can provide highly trusted candidates for subsequent biological experiments.

据观察,初级微小RNA(pri-miRNA)含有可翻译的小开放阅读框(sORF),可作为独立元素编码肽。相关研究证明,sORFs 在调节生物性状表达方面具有重要意义。现有的预测 sORFs 编码潜力的方法经常忽略这些数据,或将其归类为阴性样本,从而阻碍了在 pri-miRNAs 中识别更多可翻译的 sORFs。有鉴于此,我们提出了一种名为 misORFPred 的新方法。具体来说,该方法设计了一种增强型可扩展 k-mer(ESKmer),可同时整合序列内的组成信息和序列间的距离信息,以提取核苷酸序列特征。在特征选择之后,将最优特征和多个机器学习分类器结合起来构建集合模型,其中提出了一种新设计的动态集合投票策略(DEVS),用于动态调整基础分类器的权重,并为每个未标记样本自适应地选择最优基础分类器。交叉验证结果表明,ESKmer 和 DEVS 对该分类任务至关重要,可以提高模型性能。独立测试结果表明,misORFPred 的性能优于最先进的方法。此外,我们还在不同植物物种的基因组上执行了 misORFPerd,并对预测结果进行了全面分析。总之,misORFPred 是识别植物 pri-miRNA 中可翻译 sORFs 的强大工具,可为后续生物学实验提供高度可信的候选者。
{"title":"misORFPred: A Novel Method to Mine Translatable sORFs in Plant Pri-miRNAs Using Enhanced Scalable k-mer and Dynamic Ensemble Voting Strategy.","authors":"Haibin Li, Jun Meng, Zhaowei Wang, Yushi Luan","doi":"10.1007/s12539-024-00661-8","DOIUrl":"https://doi.org/10.1007/s12539-024-00661-8","url":null,"abstract":"<p><p>The primary microRNAs (pri-miRNAs) have been observed to contain translatable small open reading frames (sORFs) that can encode peptides as an independent element. Relevant studies have proven that those of sORFs are of significance in regulating the expression of biological traits. The existing methods for predicting the coding potential of sORFs frequently overlook this data or categorize them as negative samples, impeding the identification of additional translatable sORFs in pri-miRNAs. In light of this, a novel method named misORFPred has been proposed. Specifically, an enhanced scalable k-mer (ESKmer) that simultaneously integrates the composition information within a sequence and distance information between sequences is designed to extract the nucleotide sequence features. After feature selection, the optimal features and several machine learning classifiers are combined to construct the ensemble model, where a newly devised dynamic ensemble voting strategy (DEVS) is proposed to dynamically adjust the weights of base classifiers and adaptively select the optimal base classifiers for each unlabeled sample. Cross-validation results suggest that ESKmer and DEVS are essential for this classification task and could boost model performance. Independent testing results indicate that misORFPred outperforms the state-of-the-art methods. Furthermore, we execute misORFPerd on the genomes of various plant species and perform a thorough analysis of the predicted outcomes. Taken together, misORFPred is a powerful tool for identifying the translatable sORFs in plant pri-miRNAs and can provide highly trusted candidates for subsequent biological experiments.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Plant lncRNA-miRNA Interaction Prediction Based on Counterfactual Heterogeneous Graph Attention Network. 基于反事实异质图注意网络的植物 lncRNA-miRNA 相互作用预测
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-09 DOI: 10.1007/s12539-024-00652-9
Yu He, ZiLan Ning, XingHui Zhu, YinQiong Zhang, ChunHai Liu, SiWei Jiang, ZheMing Yuan, HongYan Zhang

Identifying interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) provides a new perspective for understanding regulatory relationships in plant life processes. Recently, computational methods based on graph neural networks (GNNs) have been widely employed to predict lncRNA-miRNA interactions (LMIs), which compensate for the inadequacy of biological experiments. However, the low-semantic and noise of graph limit the performance of existing GNN-based methods. In this paper, we develop a novel Counterfactual Heterogeneous Graph Attention Network (CFHAN) to improve the robustness to against the noise and the prediction of plant LMIs. Firstly, we construct a real-world based lncRNA-miRNA (L-M) heterogeneous network. Secondly, CFHAN utilizes the node-level attention, the semantic-level attention, and the counterfactual links to enhance the node embeddings learning. Finally, these embeddings are used as inputs for Multilayer Perceptron (MLP) to predict the interactions between lncRNAs and miRNAs. Evaluating our method on a benchmark dataset of plant LMIs, CFHAN outperforms five state-of-the-art methods, and achieves an average AUC and average ACC of 0.9953 and 0.9733, respectively. This demonstrates CFHAN's ability to predict plant LMIs and exhibits promising cross-species prediction ability, offering valuable insights for experimental LMI researches.

识别长非编码 RNA(lncRNA)和 microRNA(miRNA)之间的相互作用为了解植物生命过程中的调控关系提供了一个新的视角。最近,基于图神经网络(GNNs)的计算方法被广泛用于预测lncRNA-miRNA相互作用(LMIs),弥补了生物实验的不足。然而,图的低语义性和噪声限制了现有基于 GNN 的方法的性能。本文开发了一种新颖的反事实异构图注意网络(Counterfactual Heterogeneous Graph Attention Network,CFHAN),以提高对噪声的鲁棒性和植物 LMIs 的预测能力。首先,我们构建了一个基于真实世界的 lncRNA-miRNA(L-M)异构网络。其次,CFHAN 利用节点级关注、语义级关注和反事实链接来增强节点嵌入学习。最后,这些嵌入作为多层感知器(MLP)的输入,用于预测 lncRNA 与 miRNA 之间的相互作用。在植物 LMIs 基准数据集上评估我们的方法时,CFHAN 优于五种最先进的方法,平均 AUC 和平均 ACC 分别达到 0.9953 和 0.9733。这证明了 CFHAN 预测植物 LMI 的能力,并展现了良好的跨物种预测能力,为 LMI 实验研究提供了宝贵的启示。
{"title":"Plant lncRNA-miRNA Interaction Prediction Based on Counterfactual Heterogeneous Graph Attention Network.","authors":"Yu He, ZiLan Ning, XingHui Zhu, YinQiong Zhang, ChunHai Liu, SiWei Jiang, ZheMing Yuan, HongYan Zhang","doi":"10.1007/s12539-024-00652-9","DOIUrl":"https://doi.org/10.1007/s12539-024-00652-9","url":null,"abstract":"<p><p>Identifying interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) provides a new perspective for understanding regulatory relationships in plant life processes. Recently, computational methods based on graph neural networks (GNNs) have been widely employed to predict lncRNA-miRNA interactions (LMIs), which compensate for the inadequacy of biological experiments. However, the low-semantic and noise of graph limit the performance of existing GNN-based methods. In this paper, we develop a novel Counterfactual Heterogeneous Graph Attention Network (CFHAN) to improve the robustness to against the noise and the prediction of plant LMIs. Firstly, we construct a real-world based lncRNA-miRNA (L-M) heterogeneous network. Secondly, CFHAN utilizes the node-level attention, the semantic-level attention, and the counterfactual links to enhance the node embeddings learning. Finally, these embeddings are used as inputs for Multilayer Perceptron (MLP) to predict the interactions between lncRNAs and miRNAs. Evaluating our method on a benchmark dataset of plant LMIs, CFHAN outperforms five state-of-the-art methods, and achieves an average AUC and average ACC of 0.9953 and 0.9733, respectively. This demonstrates CFHAN's ability to predict plant LMIs and exhibits promising cross-species prediction ability, offering valuable insights for experimental LMI researches.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Molecular Fragment Representation Learning Framework for Drug-Drug Interaction Prediction. 用于药物相互作用预测的分子片段表征学习框架。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-09 DOI: 10.1007/s12539-024-00658-3
Jiaxi He, Yuping Sun, Jie Ling

The concurrent use of multiple drugs may result in drug-drug interactions, increasing the risk of adverse reactions. Hence, it is particularly crucial to propose computational methods for precisely identifying unknown drug-drug interactions, which is of great significance for drug development and health. However, most recent studies have limited the drug-drug interaction prediction task to identifying interactions between substructures, overlooking molecular hierarchical information. Moreover, the extracted substructures in these methods are always restricted to have the same number of atoms as contained in the molecular graph, which does not align with real-world facts. In this study, a molecular fragment representation learning framework for drug-drug interaction prediction is introduced. Initially, a fragment extraction module is designed to acquire a series of molecular fragments. Subsequently, to capture more comprehensive features, molecular hierarchical information is effectively integrated, enabling drug-drug interaction prediction by identifying pairwise interactions between molecular fragments of each drug. Comprehensive evaluations demonstrate that the proposed method achieved state-of-the-art performance in both DrugBank and Twosides datasets, particularly achieving an improved accuracy of over 20% for unseen drugs in both two datasets. Furthermore, case studies and visual analysis confirm that the proposed method can accurately identify crucial substructures influencing the interactions, which are basically consistent with functional group structures in reality. In conclusion, this method not only enhances the performance of drug-drug interaction prediction but also offers high interpretability. Source code is freely available at https://github.com/kennysyp/MFR-DDI .

同时使用多种药物可能会导致药物间相互作用,增加不良反应的风险。因此,提出精确识别未知药物间相互作用的计算方法尤为重要,这对药物开发和健康意义重大。然而,最近的研究大多将药物相互作用预测任务局限于识别亚结构之间的相互作用,忽略了分子层次信息。此外,这些方法提取的子结构总是被限制为与分子图中包含的原子数相同,这与实际情况不符。本研究介绍了一种用于药物相互作用预测的分子片段表征学习框架。首先,设计了一个片段提取模块来获取一系列分子片段。随后,为了获取更全面的特征,有效整合了分子层次信息,通过识别每种药物分子片段之间的配对相互作用,实现药物相互作用预测。综合评估结果表明,所提出的方法在 DrugBank 和 Twosides 数据集中都取得了最先进的性能,尤其是在这两个数据集中,对未见药物的预测准确率提高了 20% 以上。此外,案例研究和可视化分析证实,所提出的方法能准确识别影响相互作用的关键亚结构,这些亚结构与现实中的功能基团结构基本一致。总之,该方法不仅提高了药物相互作用预测的性能,而且具有很高的可解释性。源代码可在 https://github.com/kennysyp/MFR-DDI 免费获取。
{"title":"A Molecular Fragment Representation Learning Framework for Drug-Drug Interaction Prediction.","authors":"Jiaxi He, Yuping Sun, Jie Ling","doi":"10.1007/s12539-024-00658-3","DOIUrl":"https://doi.org/10.1007/s12539-024-00658-3","url":null,"abstract":"<p><p>The concurrent use of multiple drugs may result in drug-drug interactions, increasing the risk of adverse reactions. Hence, it is particularly crucial to propose computational methods for precisely identifying unknown drug-drug interactions, which is of great significance for drug development and health. However, most recent studies have limited the drug-drug interaction prediction task to identifying interactions between substructures, overlooking molecular hierarchical information. Moreover, the extracted substructures in these methods are always restricted to have the same number of atoms as contained in the molecular graph, which does not align with real-world facts. In this study, a molecular fragment representation learning framework for drug-drug interaction prediction is introduced. Initially, a fragment extraction module is designed to acquire a series of molecular fragments. Subsequently, to capture more comprehensive features, molecular hierarchical information is effectively integrated, enabling drug-drug interaction prediction by identifying pairwise interactions between molecular fragments of each drug. Comprehensive evaluations demonstrate that the proposed method achieved state-of-the-art performance in both DrugBank and Twosides datasets, particularly achieving an improved accuracy of over 20% for unseen drugs in both two datasets. Furthermore, case studies and visual analysis confirm that the proposed method can accurately identify crucial substructures influencing the interactions, which are basically consistent with functional group structures in reality. In conclusion, this method not only enhances the performance of drug-drug interaction prediction but also offers high interpretability. Source code is freely available at https://github.com/kennysyp/MFR-DDI .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network. 基于结构和残基性质的纳米蛋白质结构稳定性人工智能预测--基于平均汇集双图卷积网络
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-05 DOI: 10.1007/s12539-024-00662-7
Daixi Li, Yuqi Zhu, Wujie Zhang, Jing Liu, Xiaochen Yang, Zhihong Liu, Dongqing Wei

The structural stability of proteins is an important topic in various fields such as biotechnology, pharmaceuticals, and enzymology. Specifically, understanding the structural stability of protein is crucial for protein design. Artificial design, while pursuing high thermodynamic stability and rigidity of proteins, inevitably sacrifices biological functions closely related to protein flexibility. The thermodynamic stability of proteins is not always optimal when they are highest to perfectly perform their biological functions. Extensive theoretical and experimental screening is often required to obtain stable protein structures. Thus, it becomes critically important to develop a stability prediction model based on the balance between protein stability and bioactivity. To design protein drugs with better functionality in a broader structural space, a novel protein structural stability predictor called PSSP has been developed in this study. PSSP is a mean pooled dual graph convolutional network (GCN) model based on sequence characteristics and secondary structure, distance matrix, graph, and residue properties of a nanoprotein to provide rapid prediction and judgment. This model exhibits excellent robustness in predicting the structural stability of nanoproteins. Comparing with previous artificial intelligence algorithms, the results indicate this model can provide a rapid and accurate assessment of the structural stability of artificially designed proteins, which shows the great promises for promoting the robust development of protein design.

蛋白质的结构稳定性是生物技术、制药和酶学等多个领域的一个重要课题。具体来说,了解蛋白质的结构稳定性对于蛋白质设计至关重要。人工设计在追求蛋白质高热力学稳定性和刚性的同时,不可避免地会牺牲与蛋白质灵活性密切相关的生物学功能。蛋白质的热力学稳定性并不总是最理想的,当它们要完美地发挥其生物功能时,热力学稳定性是最高的。要获得稳定的蛋白质结构,往往需要大量的理论和实验筛选。因此,建立一个基于蛋白质稳定性和生物活性之间平衡的稳定性预测模型变得至关重要。为了在更广阔的结构空间内设计出功能更强的蛋白质药物,本研究开发了一种名为 PSSP 的新型蛋白质结构稳定性预测模型。PSSP 是一个平均池化双图卷积网络(GCN)模型,基于纳米蛋白的序列特征和二级结构、距离矩阵、图和残基属性,提供快速预测和判断。该模型在预测纳米蛋白结构稳定性方面表现出卓越的鲁棒性。与以往的人工智能算法相比,结果表明该模型能快速、准确地评估人工设计蛋白质的结构稳定性,为促进蛋白质设计的稳健发展带来了巨大的前景。
{"title":"AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network.","authors":"Daixi Li, Yuqi Zhu, Wujie Zhang, Jing Liu, Xiaochen Yang, Zhihong Liu, Dongqing Wei","doi":"10.1007/s12539-024-00662-7","DOIUrl":"https://doi.org/10.1007/s12539-024-00662-7","url":null,"abstract":"<p><p>The structural stability of proteins is an important topic in various fields such as biotechnology, pharmaceuticals, and enzymology. Specifically, understanding the structural stability of protein is crucial for protein design. Artificial design, while pursuing high thermodynamic stability and rigidity of proteins, inevitably sacrifices biological functions closely related to protein flexibility. The thermodynamic stability of proteins is not always optimal when they are highest to perfectly perform their biological functions. Extensive theoretical and experimental screening is often required to obtain stable protein structures. Thus, it becomes critically important to develop a stability prediction model based on the balance between protein stability and bioactivity. To design protein drugs with better functionality in a broader structural space, a novel protein structural stability predictor called PSSP has been developed in this study. PSSP is a mean pooled dual graph convolutional network (GCN) model based on sequence characteristics and secondary structure, distance matrix, graph, and residue properties of a nanoprotein to provide rapid prediction and judgment. This model exhibits excellent robustness in predicting the structural stability of nanoproteins. Comparing with previous artificial intelligence algorithms, the results indicate this model can provide a rapid and accurate assessment of the structural stability of artificially designed proteins, which shows the great promises for promoting the robust development of protein design.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142377868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSEL-BGC: A Bioinformatics Framework Integrating Machine Learning for Defining the Biosynthetic Evolutionary Landscape of Uncharacterized Antibacterial Natural Products. CSEL-BGC:整合机器学习的生物信息学框架,用于定义未表征抗菌天然产品的生物合成进化图谱。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-30 DOI: 10.1007/s12539-024-00656-5
Minghui Du, Yuxiang Ren, Yang Zhang, Wenwen Li, Hongtao Yang, Huiying Chu, Yongshan Zhao

The sluggish pace of new antibacterial drug development reflects a vulnerability in the face of the current severe threat posed by bacterial resistance. Microbial natural products (NPs), as a reservoir of immense chemical potential, have emerged as the most promising avenue for the discovery of next generation antibacterial agent. Directly accessing the antibacterial activity of potential products derived from biosynthetic gene clusters (BGCs) would significantly expedite the process. To tackle this issue, we propose a CSEL-BGC framework that integrates machine learning (ML) techniques. This framework involves the development of a novel cascade-stacking ensemble learning (CSEL) model and the establishment of a groundbreaking model evaluation system. Based on this framework, we predict 6,666 BGCs with antibacterial activity from 3,468 complete bacterial genomes and elucidate a biosynthetic evolutionary landscape to reveal their antibacterial potential. This provides crucial insights for interpretating the synthesis and secretion mechanisms of unknown NPs.

新抗菌药物的开发步伐缓慢,这反映出在当前细菌耐药性构成的严重威胁面前的脆弱性。微生物天然产物(NPs)蕴藏着巨大的化学潜力,已成为发现下一代抗菌剂的最有前途的途径。直接获取从生物合成基因簇(BGCs)中提取的潜在产品的抗菌活性将大大加快这一过程。为了解决这个问题,我们提出了一个整合了机器学习(ML)技术的 CSEL-BGC 框架。该框架包括开发一个新颖的级联堆叠集合学习(CSEL)模型和建立一个开创性的模型评估系统。基于这一框架,我们从 3468 个完整的细菌基因组中预测出了 6666 种具有抗菌活性的 BGCs,并阐明了生物合成进化景观,揭示了它们的抗菌潜力。这为解释未知 NPs 的合成和分泌机制提供了至关重要的见解。
{"title":"CSEL-BGC: A Bioinformatics Framework Integrating Machine Learning for Defining the Biosynthetic Evolutionary Landscape of Uncharacterized Antibacterial Natural Products.","authors":"Minghui Du, Yuxiang Ren, Yang Zhang, Wenwen Li, Hongtao Yang, Huiying Chu, Yongshan Zhao","doi":"10.1007/s12539-024-00656-5","DOIUrl":"https://doi.org/10.1007/s12539-024-00656-5","url":null,"abstract":"<p><p>The sluggish pace of new antibacterial drug development reflects a vulnerability in the face of the current severe threat posed by bacterial resistance. Microbial natural products (NPs), as a reservoir of immense chemical potential, have emerged as the most promising avenue for the discovery of next generation antibacterial agent. Directly accessing the antibacterial activity of potential products derived from biosynthetic gene clusters (BGCs) would significantly expedite the process. To tackle this issue, we propose a CSEL-BGC framework that integrates machine learning (ML) techniques. This framework involves the development of a novel cascade-stacking ensemble learning (CSEL) model and the establishment of a groundbreaking model evaluation system. Based on this framework, we predict 6,666 BGCs with antibacterial activity from 3,468 complete bacterial genomes and elucidate a biosynthetic evolutionary landscape to reveal their antibacterial potential. This provides crucial insights for interpretating the synthesis and secretion mechanisms of unknown NPs.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142346017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scCrab: A Reference-Guided Cancer Cell Identification Method based on Bayesian Neural Networks. scCrab:基于贝叶斯神经网络的参考引导癌细胞识别方法
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-30 DOI: 10.1007/s12539-024-00655-6
Heyang Hua, Wenxin Long, Yan Pan, Siyu Li, Jianyu Zhou, Haixin Wang, Shengquan Chen

Cancer is a significant global public health concern, where early detection can greatly enhance curative outcomes. Therefore, the identification of cancer cells holds significant importance as the primary method for cancer diagnosis. The advancement of single-cell RNA sequencing (scRNA-seq) technology has made it possible to address the problem of cancer cell identification at the single-cell level more efficiently with computational methods, as opposed to the time-consuming and less reproducible manual identification methods. However, existing computational methods have shown suboptimal identification performance and a lack of capability to incorporate external reference data as prior information. Here, we propose scCrab, a reference-guided automatic cancer cell identification method, which performs ensemble learning based on a Bayesian neural network (BNN) with multi-head self-attention mechanisms and a linear regression model. Through a series of experiments on various datasets, we systematically validated the superior performance of scCrab in both intra- and inter-dataset predictions. Besides, we demonstrated the robustness of scCrab to dropout rate and sample size, and conducted ablation experiments to investigate the contributions of each component in scCrab. Furthermore, as a dedicated model for cancer cell identification, scCrab effectively captures cancer-related biological significance during the identification process.

癌症是全球关注的重大公共卫生问题,早期发现可大大提高治疗效果。因此,癌细胞的鉴定作为癌症诊断的主要方法具有重要意义。随着单细胞 RNA 测序(scRNA-seq)技术的发展,与耗时且可重复性较低的人工鉴定方法相比,计算方法可以更有效地解决单细胞水平的癌细胞鉴定问题。然而,现有的计算方法都显示出不理想的识别性能,而且缺乏将外部参考数据作为先验信息的能力。在此,我们提出了一种参考指导的自动癌细胞识别方法 scCrab,该方法基于具有多头自我注意机制的贝叶斯神经网络(BNN)和线性回归模型进行集合学习。通过在各种数据集上进行一系列实验,我们系统地验证了 scCrab 在数据集内和数据集间预测方面的卓越性能。此外,我们还证明了 scCrab 对辍学率和样本大小的鲁棒性,并进行了消融实验,以研究 scCrab 中各组成部分的贡献。此外,作为癌细胞识别的专用模型,scCrab 能在识别过程中有效捕捉与癌症相关的生物学意义。
{"title":"scCrab: A Reference-Guided Cancer Cell Identification Method based on Bayesian Neural Networks.","authors":"Heyang Hua, Wenxin Long, Yan Pan, Siyu Li, Jianyu Zhou, Haixin Wang, Shengquan Chen","doi":"10.1007/s12539-024-00655-6","DOIUrl":"https://doi.org/10.1007/s12539-024-00655-6","url":null,"abstract":"<p><p>Cancer is a significant global public health concern, where early detection can greatly enhance curative outcomes. Therefore, the identification of cancer cells holds significant importance as the primary method for cancer diagnosis. The advancement of single-cell RNA sequencing (scRNA-seq) technology has made it possible to address the problem of cancer cell identification at the single-cell level more efficiently with computational methods, as opposed to the time-consuming and less reproducible manual identification methods. However, existing computational methods have shown suboptimal identification performance and a lack of capability to incorporate external reference data as prior information. Here, we propose scCrab, a reference-guided automatic cancer cell identification method, which performs ensemble learning based on a Bayesian neural network (BNN) with multi-head self-attention mechanisms and a linear regression model. Through a series of experiments on various datasets, we systematically validated the superior performance of scCrab in both intra- and inter-dataset predictions. Besides, we demonstrated the robustness of scCrab to dropout rate and sample size, and conducted ablation experiments to investigate the contributions of each component in scCrab. Furthermore, as a dedicated model for cancer cell identification, scCrab effectively captures cancer-related biological significance during the identification process.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142346020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation. 基于图神经网络的计算药物重新定位与大语言模型参考知识表示。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-26 DOI: 10.1007/s12539-024-00654-7
Yaowen Gu, Zidu Xu, Carl Yang

Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDANode Feat, LLM-DDADual GNN, LLM-DDAGNN-AE) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDAGNN-AE achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.

通过预测药物-疾病关联(DDA)来计算药物重新定位,为发现新的药物适应症提供了巨大潜力。目前的方法将图神经网络(GNN)纳入药物-疾病异构网络来预测 DDA,与传统的机器学习和矩阵因式分解方法相比,取得了显著的效果。然而,这些方法在很大程度上依赖于网络拓扑结构,受制于不完整和有噪声的网络数据,忽略了大量可用的生物医学知识。与此相对应,大型语言模型(LLM)在图搜索和关系推理方面表现出色,有可能加强将全面的生物医学知识整合到药物和疾病概况中。在本研究中,我们首先研究了 LLM 推断的知识表征在药物重新定位和 DDA 预测中的贡献。我们为 LLM 设计了一个零射提示模板,以提取高质量的药物和疾病实体知识描述,然后通过语言模型的嵌入生成将离散文本转换为连续的数字表示。然后,我们提出了具有三种不同模型架构(LLM-DDANode Feat、LLM-DDADual GNN、LLM-DDAGNN-AE)的 LLM-DDA,以研究基于 LLM 的嵌入的最佳融合模式。在四个 DDA 基准上进行的广泛实验表明,与 11 个基线相比,LLM-DDAGNN-AE 实现了最佳性能,AUPR 整体相对提高了 23.22%,F1-Score 提高了 17.20%,精度提高了 25.35%。同时,涉及泼尼松和过敏性鼻炎的选定案例研究凸显了该模型在现有文献支持下识别可靠的 DDA 和知识描述的能力。这项研究展示了 LLM 在药物重新定位方面的实用性,以及它在其他生物医学关系预测任务中的通用性和适用性。
{"title":"Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation.","authors":"Yaowen Gu, Zidu Xu, Carl Yang","doi":"10.1007/s12539-024-00654-7","DOIUrl":"https://doi.org/10.1007/s12539-024-00654-7","url":null,"abstract":"<p><p>Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDA<sub>Node Feat</sub>, LLM-DDA<sub>Dual GNN</sub>, LLM-DDA<sub>GNN-AE</sub>) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDA<sub>GNN-AE</sub> achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142346018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bilinear Perceptual Fusion Algorithm Based on Brain Functional and Structural Data for ASD Diagnosis and Regions of Interest Identification 基于大脑功能和结构数据的双线性感知融合算法,用于 ASD 诊断和感兴趣区识别
IF 4.8 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-10 DOI: 10.1007/s12539-024-00651-w
Jinxiong Fang, Da-fang Zhang, Kun Xie, Luyun Xu, Xia-an Bi

Autism spectrum disorder (ASD) is a serious mental disorder with a complex pathogenesis mechanism and variable presentation among individuals. Although many deep learning algorithms have been used to diagnose ASD, most of them focus on a single modality of data, resulting in limited information extraction and poor stability. In this paper, we propose a bilinear perceptual fusion (BPF) algorithm that leverages data from multiple modalities. In our algorithm, different schemes are used to extract features according to the characteristics of functional and structural data. Through bilinear operations, the associations between the functional and structural features of each region of interest (ROI) are captured. Then the associations are used to integrate the feature representation. Graph convolutional neural networks (GCNs) can effectively utilize topology and node features in brain network analysis. Therefore, we design a deep learning framework called BPF-GCN and conduct experiments on publicly available ASD dataset. The results show that the classification accuracy of BPF-GCN reached 82.35%, surpassing existing methods. This demonstrates the superiority of its classification performance, and the framework can extract ROIs related to ASD. Our work provides a valuable reference for the timely diagnosis and treatment of ASD.

Graphical Abstract

Based on the extracted functional and structural features, we design a generic framework called BPF-GCN. It can not only diagnose ASD, but also identify pathogenic ROIs. BPF-GCN consists of four parts. They are extraction of brain functional features, extraction of brain structural features, feature fusion and classification.

自闭症谱系障碍(ASD)是一种严重的精神障碍,发病机制复杂,个体表现各异。虽然许多深度学习算法已被用于诊断 ASD,但它们大多只关注单一模态数据,导致信息提取有限且稳定性差。在本文中,我们提出了一种双线性知觉融合(BPF)算法,该算法可充分利用来自多种模态的数据。在我们的算法中,根据功能数据和结构数据的特点,采用不同的方案来提取特征。通过双线性运算,捕捉每个感兴趣区域(ROI)的功能和结构特征之间的关联。然后利用这些关联来整合特征表示。图卷积神经网络(GCN)可以在脑网络分析中有效利用拓扑和节点特征。因此,我们设计了一个名为 BPF-GCN 的深度学习框架,并在公开的 ASD 数据集上进行了实验。结果表明,BPF-GCN 的分类准确率达到 82.35%,超过了现有方法。这证明了其分类性能的优越性,而且该框架可以提取与 ASD 相关的 ROI。我们的工作为及时诊断和治疗 ASD 提供了有价值的参考。它不仅能诊断 ASD,还能识别致病 ROI。BPF-GCN 包括四个部分。它们分别是脑功能特征提取、脑结构特征提取、特征融合和分类。
{"title":"Bilinear Perceptual Fusion Algorithm Based on Brain Functional and Structural Data for ASD Diagnosis and Regions of Interest Identification","authors":"Jinxiong Fang, Da-fang Zhang, Kun Xie, Luyun Xu, Xia-an Bi","doi":"10.1007/s12539-024-00651-w","DOIUrl":"https://doi.org/10.1007/s12539-024-00651-w","url":null,"abstract":"<p>Autism spectrum disorder (ASD) is a serious mental disorder with a complex pathogenesis mechanism and variable presentation among individuals. Although many deep learning algorithms have been used to diagnose ASD, most of them focus on a single modality of data, resulting in limited information extraction and poor stability. In this paper, we propose a bilinear perceptual fusion (BPF) algorithm that leverages data from multiple modalities. In our algorithm, different schemes are used to extract features according to the characteristics of functional and structural data. Through bilinear operations, the associations between the functional and structural features of each region of interest (ROI) are captured. Then the associations are used to integrate the feature representation. Graph convolutional neural networks (GCNs) can effectively utilize topology and node features in brain network analysis. Therefore, we design a deep learning framework called BPF-GCN and conduct experiments on publicly available ASD dataset. The results show that the classification accuracy of BPF-GCN reached 82.35%, surpassing existing methods. This demonstrates the superiority of its classification performance, and the framework can extract ROIs related to ASD. Our work provides a valuable reference for the timely diagnosis and treatment of ASD.</p><h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3><p>Based on the extracted functional and structural features, we design a generic framework called BPF-GCN. It can not only diagnose ASD, but also identify pathogenic ROIs. BPF-GCN consists of four parts. They are extraction of brain functional features, extraction of brain structural features, feature fusion and classification.</p>\u0000","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":"41 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142215931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein Multiple Conformation Prediction Using Multi-Objective Evolution Algorithm. 利用多目标进化算法预测蛋白质的多重构象
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-01-08 DOI: 10.1007/s12539-023-00597-5
Minghua Hou, Sirong Jin, Xinyue Cui, Chunxiang Peng, Kailong Zhao, Le Song, Guijun Zhang

The breakthrough of AlphaFold2 and the publication of AlphaFold DB represent a significant advance in the field of predicting static protein structures. However, AlphaFold2 models tend to represent a single static structure, and multiple-conformation prediction remains a challenge. In this work, we proposed a method named MultiSFold, which uses a distance-based multi-objective evolutionary algorithm to predict multiple conformations. To begin, multiple energy landscapes are constructed using different competing constraints generated by deep learning. Subsequently, an iterative modal exploration and exploitation strategy is designed to sample conformations, incorporating multi-objective optimization, geometric optimization and structural similarity clustering. Finally, the final population is generated using a loop-specific sampling strategy to adjust the spatial orientations. MultiSFold was evaluated against state-of-the-art methods using a benchmark set containing 80 protein targets, each characterized by two representative conformational states. Based on the proposed metric, MultiSFold achieves a remarkable success ratio of 56.25% in predicting multiple conformations, while AlphaFold2 only achieves 10.00%, which may indicate that conformational sampling combined with knowledge gained through deep learning has the potential to generate conformations spanning the range between different conformational states. In addition, MultiSFold was tested on 244 human proteins with low structural accuracy in AlphaFold DB to test whether it could further improve the accuracy of static structures. The experimental results demonstrate the performance of MultiSFold, with a TM-score better than that of AlphaFold2 by 2.97% and RoseTTAFold by 7.72%. The online server is at http://zhanglab-bioinf.com/MultiSFold .

AlphaFold2 的突破和 AlphaFold DB 的出版代表了静态蛋白质结构预测领域的重大进展。然而,AlphaFold2 模型倾向于表示单一静态结构,多构象预测仍然是一个挑战。在这项工作中,我们提出了一种名为 MultiSFold 的方法,它使用基于距离的多目标进化算法来预测多种构象。首先,利用深度学习生成的不同竞争约束构建多个能量景观。随后,设计出一种迭代模式探索和利用策略,结合多目标优化、几何优化和结构相似性聚类,对构象进行采样。最后,利用特定环路采样策略生成最终群体,以调整空间方向。MultiSFold 与最先进的方法进行了对比评估,使用的基准集包含 80 个蛋白质目标,每个目标都有两种代表性构象状态。根据提出的指标,MultiSFold 在预测多种构象方面取得了 56.25% 的显著成功率,而 AlphaFold2 仅取得了 10.00% 的成功率,这可能表明构象采样与通过深度学习获得的知识相结合,有可能生成跨越不同构象状态之间范围的构象。此外,MultiSFold 还对 AlphaFold DB 中结构准确性较低的 244 种人类蛋白质进行了测试,以检验它是否能进一步提高静态结构的准确性。实验结果证明了 MultiSFold 的性能,其 TM 分数比 AlphaFold2 高 2.97%,比 RoseTTAFold 高 7.72%。在线服务器地址为 http://zhanglab-bioinf.com/MultiSFold。
{"title":"Protein Multiple Conformation Prediction Using Multi-Objective Evolution Algorithm.","authors":"Minghua Hou, Sirong Jin, Xinyue Cui, Chunxiang Peng, Kailong Zhao, Le Song, Guijun Zhang","doi":"10.1007/s12539-023-00597-5","DOIUrl":"10.1007/s12539-023-00597-5","url":null,"abstract":"<p><p>The breakthrough of AlphaFold2 and the publication of AlphaFold DB represent a significant advance in the field of predicting static protein structures. However, AlphaFold2 models tend to represent a single static structure, and multiple-conformation prediction remains a challenge. In this work, we proposed a method named MultiSFold, which uses a distance-based multi-objective evolutionary algorithm to predict multiple conformations. To begin, multiple energy landscapes are constructed using different competing constraints generated by deep learning. Subsequently, an iterative modal exploration and exploitation strategy is designed to sample conformations, incorporating multi-objective optimization, geometric optimization and structural similarity clustering. Finally, the final population is generated using a loop-specific sampling strategy to adjust the spatial orientations. MultiSFold was evaluated against state-of-the-art methods using a benchmark set containing 80 protein targets, each characterized by two representative conformational states. Based on the proposed metric, MultiSFold achieves a remarkable success ratio of 56.25% in predicting multiple conformations, while AlphaFold2 only achieves 10.00%, which may indicate that conformational sampling combined with knowledge gained through deep learning has the potential to generate conformations spanning the range between different conformational states. In addition, MultiSFold was tested on 244 human proteins with low structural accuracy in AlphaFold DB to test whether it could further improve the accuracy of static structures. The experimental results demonstrate the performance of MultiSFold, with a TM-score better than that of AlphaFold2 by 2.97% and RoseTTAFold by 7.72%. The online server is at http://zhanglab-bioinf.com/MultiSFold .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"519-531"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139377543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting circRNA-RBP Binding Sites Using a Hybrid Deep Neural Network. 利用混合深度神经网络预测 circRNA-RBP 结合位点
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-02-21 DOI: 10.1007/s12539-024-00616-z
Liwei Liu, Yixin Wei, Zhebin Tan, Qi Zhang, Jianqiang Sun, Qi Zhao

Circular RNAs (circRNAs) are non-coding RNAs generated by reverse splicing. They are involved in biological process and human diseases by interacting with specific RNA-binding proteins (RBPs). Due to traditional biological experiments being costly, computational methods have been proposed to predict the circRNA-RBP interaction. However, these methods have problems of single feature extraction. Therefore, we propose a novel model called circ-FHN, which utilizes only circRNA sequences to predict circRNA-RBP interactions. The circ-FHN approach involves feature coding and a hybrid deep learning model. Feature coding takes into account the physicochemical properties of circRNA sequences and employs four coding methods to extract sequence features. The hybrid deep structure comprises a convolutional neural network (CNN) and a bidirectional gated recurrent unit (BiGRU). The CNN learns high-level abstract features, while the BiGRU captures long-term dependencies in the sequence. To assess the effectiveness of circ-FHN, we compared it to other computational methods on 16 datasets and conducted ablation experiments. Additionally, we conducted motif analysis. The results demonstrate that circ-FHN exhibits exceptional performance and surpasses other methods. circ-FHN is freely available at https://github.com/zhaoqi106/circ-FHN .

环状 RNA(circRNA)是由反向剪接产生的非编码 RNA。它们通过与特定的 RNA 结合蛋白(RBPs)相互作用,参与生物过程和人类疾病。由于传统的生物学实验成本高昂,人们提出了计算方法来预测 circRNA-RBP 相互作用。然而,这些方法都存在单一特征提取的问题。因此,我们提出了一种名为 circ-FHN 的新模型,它只利用 circRNA 序列来预测 circRNA-RBP 相互作用。circ-FHN 方法包括特征编码和混合深度学习模型。特征编码考虑到 circRNA 序列的物理化学特性,采用四种编码方法提取序列特征。混合深度结构包括一个卷积神经网络(CNN)和一个双向门控递归单元(BiGRU)。CNN 学习高级抽象特征,而 BiGRU 则捕捉序列中的长期依赖关系。为了评估 circ-FHN 的有效性,我们在 16 个数据集上将其与其他计算方法进行了比较,并进行了消融实验。此外,我们还进行了主题分析。结果表明,circ-FHN 性能卓越,超越了其他方法。circ-FHN 可在 https://github.com/zhaoqi106/circ-FHN 免费获取。
{"title":"Predicting circRNA-RBP Binding Sites Using a Hybrid Deep Neural Network.","authors":"Liwei Liu, Yixin Wei, Zhebin Tan, Qi Zhang, Jianqiang Sun, Qi Zhao","doi":"10.1007/s12539-024-00616-z","DOIUrl":"10.1007/s12539-024-00616-z","url":null,"abstract":"<p><p>Circular RNAs (circRNAs) are non-coding RNAs generated by reverse splicing. They are involved in biological process and human diseases by interacting with specific RNA-binding proteins (RBPs). Due to traditional biological experiments being costly, computational methods have been proposed to predict the circRNA-RBP interaction. However, these methods have problems of single feature extraction. Therefore, we propose a novel model called circ-FHN, which utilizes only circRNA sequences to predict circRNA-RBP interactions. The circ-FHN approach involves feature coding and a hybrid deep learning model. Feature coding takes into account the physicochemical properties of circRNA sequences and employs four coding methods to extract sequence features. The hybrid deep structure comprises a convolutional neural network (CNN) and a bidirectional gated recurrent unit (BiGRU). The CNN learns high-level abstract features, while the BiGRU captures long-term dependencies in the sequence. To assess the effectiveness of circ-FHN, we compared it to other computational methods on 16 datasets and conducted ablation experiments. Additionally, we conducted motif analysis. The results demonstrate that circ-FHN exhibits exceptional performance and surpasses other methods. circ-FHN is freely available at https://github.com/zhaoqi106/circ-FHN .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"635-648"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139912565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Interdisciplinary Sciences: Computational Life Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1