首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
GloEC: a hierarchical-aware global model for predicting enzyme function. GloEC:用于预测酶功能的分层感知全局模型。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae365
Yiran Huang, Yufu Lin, Wei Lan, Cuiyu Huang, Cheng Zhong

The annotation of enzyme function is a fundamental challenge in industrial biotechnology and pathologies. Numerous computational methods have been proposed to predict enzyme function by annotating enzyme labels with Enzyme Commission number. However, the existing methods face difficulties in modelling the hierarchical structure of enzyme label in a global view. Moreover, they haven't gone entirely to leverage the mutual interactions between different levels of enzyme label. In this paper, we formulate the hierarchy of enzyme label as a directed enzyme graph and propose a hierarchy-GCN (Graph Convolutional Network) encoder to globally model enzyme label dependency on the enzyme graph. Based on the enzyme hierarchy encoder, we develop an end-to-end hierarchical-aware global model named GloEC to predict enzyme function. GloEC learns hierarchical-aware enzyme label embeddings via the hierarchy-GCN encoder and conducts deductive fusion of label-aware enzyme features to predict enzyme labels. Meanwhile, our hierarchy-GCN encoder is designed to bidirectionally compute to investigate the enzyme label correlation information in both bottom-up and top-down manners, which has not been explored in enzyme function prediction. Comparative experiments on three benchmark datasets show that GloEC achieves better predictive performance as compared to the existing methods. The case studies also demonstrate that GloEC is capable of effectively predicting the function of isoenzyme. GloEC is available at: https://github.com/hyr0771/GloEC.

酶功能注释是工业生物技术和病理学领域的一项基本挑战。人们提出了许多计算方法,通过用酶委员会编号注释酶标签来预测酶的功能。然而,现有的方法难以从全局角度对酶标签的层次结构进行建模。此外,它们还没有完全利用不同层次酶标之间的相互影响。在本文中,我们将酶标签的层次结构表述为有向酶图,并提出了一种层次结构-GCN(图卷积网络)编码器来全局模拟酶图上的酶标签依赖关系。在酶分层编码器的基础上,我们开发了一种端到端分层感知全局模型,命名为 GloEC,用于预测酶的功能。GloEC 通过分层-GCN 编码器学习分层感知的酶标签嵌入,并对标签感知的酶特征进行演绎融合,从而预测酶标签。同时,我们的分层-GCN编码器设计为双向计算,以自下而上和自上而下的方式研究酶标签相关信息,这在酶功能预测中还没有被探索过。三个基准数据集的对比实验表明,与现有方法相比,GloEC 实现了更好的预测性能。案例研究还证明,GloEC 能够有效预测同工酶的功能。GloEC 可在以下网址获取:https://github.com/hyr0771/GloEC。
{"title":"GloEC: a hierarchical-aware global model for predicting enzyme function.","authors":"Yiran Huang, Yufu Lin, Wei Lan, Cuiyu Huang, Cheng Zhong","doi":"10.1093/bib/bbae365","DOIUrl":"10.1093/bib/bbae365","url":null,"abstract":"<p><p>The annotation of enzyme function is a fundamental challenge in industrial biotechnology and pathologies. Numerous computational methods have been proposed to predict enzyme function by annotating enzyme labels with Enzyme Commission number. However, the existing methods face difficulties in modelling the hierarchical structure of enzyme label in a global view. Moreover, they haven't gone entirely to leverage the mutual interactions between different levels of enzyme label. In this paper, we formulate the hierarchy of enzyme label as a directed enzyme graph and propose a hierarchy-GCN (Graph Convolutional Network) encoder to globally model enzyme label dependency on the enzyme graph. Based on the enzyme hierarchy encoder, we develop an end-to-end hierarchical-aware global model named GloEC to predict enzyme function. GloEC learns hierarchical-aware enzyme label embeddings via the hierarchy-GCN encoder and conducts deductive fusion of label-aware enzyme features to predict enzyme labels. Meanwhile, our hierarchy-GCN encoder is designed to bidirectionally compute to investigate the enzyme label correlation information in both bottom-up and top-down manners, which has not been explored in enzyme function prediction. Comparative experiments on three benchmark datasets show that GloEC achieves better predictive performance as compared to the existing methods. The case studies also demonstrate that GloEC is capable of effectively predicting the function of isoenzyme. GloEC is available at: https://github.com/hyr0771/GloEC.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scLEGA: an attention-based deep clustering method with a tendency for low expression of genes on single-cell RNA-seq data. scLEGA:一种基于注意力的深度聚类方法,在单细胞 RNA-seq 数据中倾向于低表达基因。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae371
Zhenze Liu, Yingjian Liang, Guohua Wang, Tianjiao Zhang

Single-cell RNA sequencing (scRNA-seq) enables the exploration of biological heterogeneity among different cell types within tissues at a resolution. Inferring cell types within tissues is foundational for downstream research. Most existing methods for cell type inference based on scRNA-seq data primarily utilize highly variable genes (HVGs) with higher expression levels as clustering features, overlooking the contribution of HVGs with lower expression levels. To address this, we have designed a novel cell type inference method for scRNA-seq data, termed scLEGA. scLEGA employs a novel zero-inflated negative binomial (ZINB) loss function that fully considers the contribution of genes with lower expression levels and combines two distinct scRNA-seq clustering strategies through a multi-head attention mechanism. It utilizes a low-expression optimized denoising autoencoder, based on the novel ZINB model, to extract low-dimensional features and handle dropout events, and a GCN-based graph autoencoder (GAE) that leverages neighbor information to guide dimensionality reduction. The iterative fusion of denoising and topological embedding in scLEGA facilitates the acquisition of cluster-friendly cell representations in the hidden embedding, where similar cells are brought closer together. Compared to 12 state-of-the-art cell type inference methods on 15 scRNA-seq datasets, scLEGA demonstrates superior performance in clustering accuracy, scalability, and stability. Our scLEGA model codes are freely available at https://github.com/Masonze/scLEGA-main.

单细胞 RNA 测序(scRNA-seq)可以探索组织内不同细胞类型之间的生物异质性。推断组织内的细胞类型是下游研究的基础。现有的大多数基于 scRNA-seq 数据的细胞类型推断方法主要利用表达水平较高的高变异基因(HVG)作为聚类特征,忽略了表达水平较低的高变异基因的贡献。为了解决这个问题,我们为 scRNA-seq 数据设计了一种新的细胞类型推断方法,称为 scLEGA。scLEGA 采用了一种新的零膨胀负二项式(ZINB)损失函数,充分考虑了表达水平较低的基因的贡献,并通过多头关注机制结合了两种不同的 scRNA-seq 聚类策略。它利用基于新型 ZINB 模型的低表达优化去噪自编码器提取低维特征并处理丢失事件,还利用基于 GCN 的图自编码器(GAE)利用邻接信息指导降维。在 scLEGA 中,去噪和拓扑嵌入的迭代融合促进了在隐藏嵌入中获得集群友好的细胞表示,其中相似的细胞被靠得更近。在15个scRNA-seq数据集上,与12种最先进的细胞类型推断方法相比,scLEGA在聚类准确性、可扩展性和稳定性方面都表现出了卓越的性能。我们的 scLEGA 模型代码可在 https://github.com/Masonze/scLEGA-main 免费获取。
{"title":"scLEGA: an attention-based deep clustering method with a tendency for low expression of genes on single-cell RNA-seq data.","authors":"Zhenze Liu, Yingjian Liang, Guohua Wang, Tianjiao Zhang","doi":"10.1093/bib/bbae371","DOIUrl":"10.1093/bib/bbae371","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) enables the exploration of biological heterogeneity among different cell types within tissues at a resolution. Inferring cell types within tissues is foundational for downstream research. Most existing methods for cell type inference based on scRNA-seq data primarily utilize highly variable genes (HVGs) with higher expression levels as clustering features, overlooking the contribution of HVGs with lower expression levels. To address this, we have designed a novel cell type inference method for scRNA-seq data, termed scLEGA. scLEGA employs a novel zero-inflated negative binomial (ZINB) loss function that fully considers the contribution of genes with lower expression levels and combines two distinct scRNA-seq clustering strategies through a multi-head attention mechanism. It utilizes a low-expression optimized denoising autoencoder, based on the novel ZINB model, to extract low-dimensional features and handle dropout events, and a GCN-based graph autoencoder (GAE) that leverages neighbor information to guide dimensionality reduction. The iterative fusion of denoising and topological embedding in scLEGA facilitates the acquisition of cluster-friendly cell representations in the hidden embedding, where similar cells are brought closer together. Compared to 12 state-of-the-art cell type inference methods on 15 scRNA-seq datasets, scLEGA demonstrates superior performance in clustering accuracy, scalability, and stability. Our scLEGA model codes are freely available at https://github.com/Masonze/scLEGA-main.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11281828/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BPP: a platform for automatic biochemical pathway prediction. BPP:生化途径自动预测平台。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae355
Xinhao Yi, Siwei Liu, Yu Wu, Douglas McCloskey, Zaiqiao Meng

A biochemical pathway consists of a series of interconnected biochemical reactions to accomplish specific life activities. The participating reactants and resultant products of a pathway, including gene fragments, proteins, and small molecules, coalesce to form a complex reaction network. Biochemical pathways play a critical role in the biochemical domain as they can reveal the flow of biochemical reactions in living organisms, making them essential for understanding life processes. Existing studies of biochemical pathway networks are mainly based on experimentation and pathway database analysis methods, which are plagued by substantial cost constraints. Inspired by the success of representation learning approaches in biomedicine, we develop the biochemical pathway prediction (BPP) platform, which is an automatic BPP platform to identify potential links or attributes within biochemical pathway networks. Our BPP platform incorporates a variety of representation learning models, including the latest hypergraph neural networks technology to model biochemical reactions in pathways. In particular, BPP contains the latest biochemical pathway-based datasets and enables the prediction of potential participants or products of biochemical reactions in biochemical pathways. Additionally, BPP is equipped with an SHAP explainer to explain the predicted results and to calculate the contributions of each participating element. We conduct extensive experiments on our collected biochemical pathway dataset to benchmark the effectiveness of all models available on BPP. Furthermore, our detailed case studies based on the chronological pattern of our dataset demonstrate the effectiveness of our platform. Our BPP web portal, source code and datasets are freely accessible at https://github.com/Glasgow-AI4BioMed/BPP.

生化途径由一系列相互关联的生化反应组成,以完成特定的生命活动。途径中的参与反应物和结果产物,包括基因片段、蛋白质和小分子,共同形成一个复杂的反应网络。生化途径在生化领域发挥着至关重要的作用,因为它们可以揭示生物体内生化反应的流程,对了解生命过程至关重要。现有的生化通路网络研究主要基于实验和通路数据库分析方法,这些方法受到大量成本的限制。受表征学习方法在生物医学领域成功应用的启发,我们开发了生化通路预测(BPP)平台,这是一个自动生化通路预测平台,用于识别生化通路网络中的潜在链接或属性。我们的 BPP 平台采用了多种表征学习模型,包括最新的超图神经网络技术,为通路中的生化反应建模。特别是,BPP 包含最新的基于生化途径的数据集,能够预测生化途径中生化反应的潜在参与者或产物。此外,BPP 还配备了 SHAP 解释器,用于解释预测结果和计算每个参与元素的贡献。我们在收集的生化通路数据集上进行了大量实验,以衡量 BPP 上所有可用模型的有效性。此外,我们还根据数据集的时间顺序模式进行了详细的案例研究,证明了我们平台的有效性。我们的 BPP 门户网站、源代码和数据集可通过 https://github.com/Glasgow-AI4BioMed/BPP 免费访问。
{"title":"BPP: a platform for automatic biochemical pathway prediction.","authors":"Xinhao Yi, Siwei Liu, Yu Wu, Douglas McCloskey, Zaiqiao Meng","doi":"10.1093/bib/bbae355","DOIUrl":"10.1093/bib/bbae355","url":null,"abstract":"<p><p>A biochemical pathway consists of a series of interconnected biochemical reactions to accomplish specific life activities. The participating reactants and resultant products of a pathway, including gene fragments, proteins, and small molecules, coalesce to form a complex reaction network. Biochemical pathways play a critical role in the biochemical domain as they can reveal the flow of biochemical reactions in living organisms, making them essential for understanding life processes. Existing studies of biochemical pathway networks are mainly based on experimentation and pathway database analysis methods, which are plagued by substantial cost constraints. Inspired by the success of representation learning approaches in biomedicine, we develop the biochemical pathway prediction (BPP) platform, which is an automatic BPP platform to identify potential links or attributes within biochemical pathway networks. Our BPP platform incorporates a variety of representation learning models, including the latest hypergraph neural networks technology to model biochemical reactions in pathways. In particular, BPP contains the latest biochemical pathway-based datasets and enables the prediction of potential participants or products of biochemical reactions in biochemical pathways. Additionally, BPP is equipped with an SHAP explainer to explain the predicted results and to calculate the contributions of each participating element. We conduct extensive experiments on our collected biochemical pathway dataset to benchmark the effectiveness of all models available on BPP. Furthermore, our detailed case studies based on the chronological pattern of our dataset demonstrate the effectiveness of our platform. Our BPP web portal, source code and datasets are freely accessible at https://github.com/Glasgow-AI4BioMed/BPP.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11289738/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141854882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiffPROTACs is a deep learning-based generator for proteolysis targeting chimeras. DiffPROTACs 是一种基于深度学习的蛋白水解靶向嵌合体生成器。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae358
Fenglei Li, Qiaoyu Hu, Yongqi Zhou, Hao Yang, Fang Bai

PROteolysis TArgeting Chimeras (PROTACs) has recently emerged as a promising technology. However, the design of rational PROTACs, especially the linker component, remains challenging due to the absence of structure-activity relationships and experimental data. Leveraging the structural characteristics of PROTACs, fragment-based drug design (FBDD) provides a feasible approach for PROTAC research. Concurrently, artificial intelligence-generated content has attracted considerable attention, with diffusion models and Transformers emerging as indispensable tools in this field. In response, we present a new diffusion model, DiffPROTACs, harnessing the power of Transformers to learn and generate new PROTAC linkers based on given ligands. To introduce the essential inductive biases required for molecular generation, we propose the O(3) equivariant graph Transformer module, which augments Transformers with graph neural networks (GNNs), using Transformers to update nodes and GNNs to update the coordinates of PROTAC atoms. DiffPROTACs effectively competes with existing models and achieves comparable performance on two traditional FBDD datasets, ZINC and GEOM. To differentiate the molecular characteristics between PROTACs and traditional small molecules, we fine-tuned the model on our self-built PROTACs dataset, achieving a 93.86% validity rate for generated PROTACs. Additionally, we provide a generated PROTAC database for further research, which can be accessed at https://bailab.siais.shanghaitech.edu.cn/service/DiffPROTACs-generated.tgz. The corresponding code is available at https://github.com/Fenglei104/DiffPROTACs and the server is at https://bailab.siais.shanghaitech.edu.cn/services/diffprotacs.

近来,PROteolysis TArgeting Chimeras(PROTACs)成为一种前景广阔的技术。然而,由于缺乏结构-活性关系和实验数据,设计合理的 PROTACs(尤其是连接体成分)仍具有挑战性。利用 PROTAC 的结构特征,基于片段的药物设计(FBDD)为 PROTAC 研究提供了一种可行的方法。与此同时,人工智能生成的内容也引起了广泛关注,扩散模型和 Transformers 成为这一领域不可或缺的工具。为此,我们提出了一种新的扩散模型 DiffPROTACs,利用 Transformers 的强大功能,根据给定配体学习并生成新的 PROTAC 连接体。为了引入分子生成所需的基本归纳偏差,我们提出了 O(3) 等变图变换器模块,该模块用图神经网络 (GNN) 增强了变换器,使用变换器更新节点,使用 GNN 更新 PROTAC 原子的坐标。DiffPROTACs 能有效地与现有模型竞争,并在两个传统的 FBDD 数据集 ZINC 和 GEOM 上取得了相当的性能。为了区分 PROTACs 和传统小分子的分子特征,我们在自建的 PROTACs 数据集上对模型进行了微调,生成的 PROTACs 有效率达到 93.86%。此外,我们还提供了一个生成的 PROTAC 数据库,供进一步研究使用,该数据库的访问网址为 https://bailab.siais.shanghaitech.edu.cn/service/DiffPROTACs-generated.tgz。相应代码见 https://github.com/Fenglei104/DiffPROTACs,服务器见 https://bailab.siais.shanghaitech.edu.cn/services/diffprotacs。
{"title":"DiffPROTACs is a deep learning-based generator for proteolysis targeting chimeras.","authors":"Fenglei Li, Qiaoyu Hu, Yongqi Zhou, Hao Yang, Fang Bai","doi":"10.1093/bib/bbae358","DOIUrl":"10.1093/bib/bbae358","url":null,"abstract":"<p><p>PROteolysis TArgeting Chimeras (PROTACs) has recently emerged as a promising technology. However, the design of rational PROTACs, especially the linker component, remains challenging due to the absence of structure-activity relationships and experimental data. Leveraging the structural characteristics of PROTACs, fragment-based drug design (FBDD) provides a feasible approach for PROTAC research. Concurrently, artificial intelligence-generated content has attracted considerable attention, with diffusion models and Transformers emerging as indispensable tools in this field. In response, we present a new diffusion model, DiffPROTACs, harnessing the power of Transformers to learn and generate new PROTAC linkers based on given ligands. To introduce the essential inductive biases required for molecular generation, we propose the O(3) equivariant graph Transformer module, which augments Transformers with graph neural networks (GNNs), using Transformers to update nodes and GNNs to update the coordinates of PROTAC atoms. DiffPROTACs effectively competes with existing models and achieves comparable performance on two traditional FBDD datasets, ZINC and GEOM. To differentiate the molecular characteristics between PROTACs and traditional small molecules, we fine-tuned the model on our self-built PROTACs dataset, achieving a 93.86% validity rate for generated PROTACs. Additionally, we provide a generated PROTAC database for further research, which can be accessed at https://bailab.siais.shanghaitech.edu.cn/service/DiffPROTACs-generated.tgz. The corresponding code is available at https://github.com/Fenglei104/DiffPROTACs and the server is at https://bailab.siais.shanghaitech.edu.cn/services/diffprotacs.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299039/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PMiSLocMF: predicting miRNA subcellular localizations by incorporating multi-source features of miRNAs. PMiSLocMF:通过结合 miRNA 的多源特征预测 miRNA 的亚细胞定位。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae386
Lei Chen, Jiahui Gu, Bo Zhou

The microRNAs (miRNAs) play crucial roles in several biological processes. It is essential for a deeper insight into their functions and mechanisms by detecting their subcellular localizations. The traditional methods for determining miRNAs subcellular localizations are expensive. The computational methods are alternative ways to quickly predict miRNAs subcellular localizations. Although several computational methods have been proposed in this regard, the incomplete representations of miRNAs in these methods left the room for improvement. In this study, a novel computational method for predicting miRNA subcellular localizations, named PMiSLocMF, was developed. As lots of miRNAs have multiple subcellular localizations, this method was a multi-label classifier. Several properties of miRNA, such as miRNA sequences, miRNA functional similarity, miRNA-disease, miRNA-drug, and miRNA-mRNA associations were adopted for generating informative miRNA features. To this end, powerful algorithms [node2vec and graph attention auto-encoder (GATE)] and one newly designed scheme were adopted to process above properties, producing five feature types. All features were poured into self-attention and fully connected layers to make predictions. The cross-validation results indicated the high performance of PMiSLocMF with accuracy higher than 0.83, average area under the receiver operating characteristic curve (AUC) and area under the precision-recall curve (AUPR) exceeding 0.90 and 0.77, respectively. Such performance was better than all previous methods based on the same dataset. Further tests proved that using all feature types can improve the performance of PMiSLocMF, and GATE and self-attention layer can help enhance the performance. Finally, we deeply analyzed the influence of miRNA associations with diseases, drugs, and mRNAs on PMiSLocMF. The dataset and codes are available at https://github.com/Gu20201017/PMiSLocMF.

微小核糖核酸(miRNA)在多个生物过程中发挥着至关重要的作用。要深入了解它们的功能和机制,就必须检测它们的亚细胞定位。确定 miRNAs 亚细胞定位的传统方法成本高昂。计算方法是快速预测 miRNAs 亚细胞定位的替代方法。虽然在这方面已经提出了几种计算方法,但由于这些方法对 miRNA 的表述不完整,因此还有改进的余地。本研究开发了一种预测 miRNA 亚细胞定位的新型计算方法,命名为 PMiSLocMF。由于很多 miRNA 有多种亚细胞定位,因此该方法是一种多标签分类器。该方法采用了 miRNA 的一些特性,如 miRNA 序列、miRNA 功能相似性、miRNA-疾病、miRNA-药物和 miRNA-mRNA 关联,以生成信息丰富的 miRNA 特征。为此,采用了功能强大的算法(node2vec 和图注意自动编码器(GATE))和一种新设计的方案来处理上述属性,产生了五种特征类型。所有特征都被注入自注意层和全连接层进行预测。交叉验证结果表明,PMiSLocMF 的准确率高于 0.83,平均接收器工作特征曲线下面积(AUC)和精确度-召回曲线下面积(AUPR)分别超过 0.90 和 0.77,表现出较高的性能。这样的表现优于之前所有基于相同数据集的方法。进一步的测试证明,使用所有特征类型都能提高 PMiSLocMF 的性能,而 GATE 和自我注意层也有助于提高性能。最后,我们深入分析了 miRNA 与疾病、药物和 mRNA 的关联对 PMiSLocMF 的影响。数据集和代码见 https://github.com/Gu20201017/PMiSLocMF。
{"title":"PMiSLocMF: predicting miRNA subcellular localizations by incorporating multi-source features of miRNAs.","authors":"Lei Chen, Jiahui Gu, Bo Zhou","doi":"10.1093/bib/bbae386","DOIUrl":"10.1093/bib/bbae386","url":null,"abstract":"<p><p>The microRNAs (miRNAs) play crucial roles in several biological processes. It is essential for a deeper insight into their functions and mechanisms by detecting their subcellular localizations. The traditional methods for determining miRNAs subcellular localizations are expensive. The computational methods are alternative ways to quickly predict miRNAs subcellular localizations. Although several computational methods have been proposed in this regard, the incomplete representations of miRNAs in these methods left the room for improvement. In this study, a novel computational method for predicting miRNA subcellular localizations, named PMiSLocMF, was developed. As lots of miRNAs have multiple subcellular localizations, this method was a multi-label classifier. Several properties of miRNA, such as miRNA sequences, miRNA functional similarity, miRNA-disease, miRNA-drug, and miRNA-mRNA associations were adopted for generating informative miRNA features. To this end, powerful algorithms [node2vec and graph attention auto-encoder (GATE)] and one newly designed scheme were adopted to process above properties, producing five feature types. All features were poured into self-attention and fully connected layers to make predictions. The cross-validation results indicated the high performance of PMiSLocMF with accuracy higher than 0.83, average area under the receiver operating characteristic curve (AUC) and area under the precision-recall curve (AUPR) exceeding 0.90 and 0.77, respectively. Such performance was better than all previous methods based on the same dataset. Further tests proved that using all feature types can improve the performance of PMiSLocMF, and GATE and self-attention layer can help enhance the performance. Finally, we deeply analyzed the influence of miRNA associations with diseases, drugs, and mRNAs on PMiSLocMF. The dataset and codes are available at https://github.com/Gu20201017/PMiSLocMF.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11330342/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141995343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HNCGAT: a method for predicting plant metabolite-protein interaction using heterogeneous neighbor contrastive graph attention network. HNCGAT:利用异质邻接对比图注意网络预测植物代谢物与蛋白质相互作用的方法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae397
Xi Zhou, Jing Yang, Yin Luo, Xiao Shen

The prediction of metabolite-protein interactions (MPIs) plays an important role in plant basic life functions. Compared with the traditional experimental methods and the high-throughput genomics methods using statistical correlation, applying heterogeneous graph neural networks to the prediction of MPIs in plants can reduce the cost of manpower, resources, and time. However, to the best of our knowledge, applying heterogeneous graph neural networks to the prediction of MPIs in plants still remains under-explored. In this work, we propose a novel model named heterogeneous neighbor contrastive graph attention network (HNCGAT), for the prediction of MPIs in Arabidopsis. The HNCGAT employs the type-specific attention-based neighborhood aggregation mechanism to learn node embeddings of proteins, metabolites, and functional-annotations, and designs a novel heterogeneous neighbor contrastive learning framework to preserve heterogeneous network topological structures. Extensive experimental results and ablation study demonstrate the effectiveness of the HNCGAT model for MPI prediction. In addition, a case study on our MPI prediction results supports that the HNCGAT model can effectively predict the potential MPIs in plant.

代谢物-蛋白质相互作用(MPIs)的预测在植物基本生命功能中发挥着重要作用。与传统的实验方法和利用统计相关性的高通量基因组学方法相比,应用异构图神经网络预测植物中的 MPIs 可以减少人力、物力和时间成本。然而,据我们所知,应用异构图神经网络预测植物中的 MPIs 的研究仍处于探索阶段。在这项工作中,我们提出了一种名为异质邻接对比图注意网络(HNCGAT)的新模型,用于预测拟南芥中的 MPIs。HNCGAT 采用了基于特定类型注意力的邻域聚合机制来学习蛋白质、代谢物和功能注释的节点嵌入,并设计了一个新颖的异构邻域对比学习框架来保留异构网络拓扑结构。广泛的实验结果和消融研究证明了 HNCGAT 模型在 MPI 预测中的有效性。此外,对 MPI 预测结果的案例研究也证明了 HNCGAT 模型能有效预测植物中潜在的 MPI。
{"title":"HNCGAT: a method for predicting plant metabolite-protein interaction using heterogeneous neighbor contrastive graph attention network.","authors":"Xi Zhou, Jing Yang, Yin Luo, Xiao Shen","doi":"10.1093/bib/bbae397","DOIUrl":"https://doi.org/10.1093/bib/bbae397","url":null,"abstract":"<p><p>The prediction of metabolite-protein interactions (MPIs) plays an important role in plant basic life functions. Compared with the traditional experimental methods and the high-throughput genomics methods using statistical correlation, applying heterogeneous graph neural networks to the prediction of MPIs in plants can reduce the cost of manpower, resources, and time. However, to the best of our knowledge, applying heterogeneous graph neural networks to the prediction of MPIs in plants still remains under-explored. In this work, we propose a novel model named heterogeneous neighbor contrastive graph attention network (HNCGAT), for the prediction of MPIs in Arabidopsis. The HNCGAT employs the type-specific attention-based neighborhood aggregation mechanism to learn node embeddings of proteins, metabolites, and functional-annotations, and designs a novel heterogeneous neighbor contrastive learning framework to preserve heterogeneous network topological structures. Extensive experimental results and ablation study demonstrate the effectiveness of the HNCGAT model for MPI prediction. In addition, a case study on our MPI prediction results supports that the HNCGAT model can effectively predict the potential MPIs in plant.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142003635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HHOMR: a hybrid high-order moment residual model for miRNA-disease association prediction. HHOMR:用于 miRNA 与疾病关联预测的混合高阶矩残差模型。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae412
Zhengwei Li, Lipeng Wan, Lei Wang, Wenjing Wang, Ru Nie

Numerous studies have demonstrated that microRNAs (miRNAs) are critically important for the prediction, diagnosis, and characterization of diseases. However, identifying miRNA-disease associations through traditional biological experiments is both costly and time-consuming. To further explore these associations, we proposed a model based on hybrid high-order moments combined with element-level attention mechanisms (HHOMR). This model innovatively fused hybrid higher-order statistical information along with structural and community information. Specifically, we first constructed a heterogeneous graph based on existing associations between miRNAs and diseases. HHOMR employs a structural fusion layer to capture structure-level embeddings and leverages a hybrid high-order moments encoder layer to enhance features. Element-level attention mechanisms are then used to adaptively integrate the features of these hybrid moments. Finally, a multi-layer perceptron is utilized to calculate the association scores between miRNAs and diseases. Through five-fold cross-validation on HMDD v2.0, we achieved a mean AUC of 93.28%. Compared with four state-of-the-art models, HHOMR exhibited superior performance. Additionally, case studies on three diseases-esophageal neoplasms, lymphoma, and prostate neoplasms-were conducted. Among the top 50 miRNAs with high disease association scores, 46, 47, and 45 associated with these diseases were confirmed by the dbDEMC and miR2Disease databases, respectively. Our results demonstrate that HHOMR not only outperforms existing models but also shows significant potential in predicting miRNA-disease associations.

大量研究表明,microRNA(miRNA)对疾病的预测、诊断和特征描述至关重要。然而,通过传统的生物学实验来确定 miRNA 与疾病的关联既昂贵又耗时。为了进一步探索这些关联,我们提出了一种基于混合高阶矩结合元素级注意机制(HHOMR)的模型。该模型创新性地将混合高阶统计信息与结构和群落信息融合在一起。具体来说,我们首先根据现有的 miRNA 与疾病之间的关联构建了一个异构图。HHOMR 采用结构融合层来捕捉结构级嵌入,并利用混合高阶矩编码器层来增强特征。然后使用元素级注意机制来自适应地整合这些混合矩的特征。最后,利用多层感知器计算 miRNA 与疾病之间的关联分数。通过在 HMDD v2.0 上进行五倍交叉验证,我们的平均 AUC 达到了 93.28%。与四种最先进的模型相比,HHOMR 表现出更优越的性能。此外,我们还对食管肿瘤、淋巴瘤和前列腺肿瘤这三种疾病进行了案例研究。在疾病关联度得分最高的 50 个 miRNA 中,与这些疾病相关的 46、47 和 45 个分别得到了 dbDEMC 和 miR2Disease 数据库的证实。我们的研究结果表明,HHOMR 不仅优于现有的模型,而且在预测 miRNA 与疾病的关联方面显示出巨大的潜力。
{"title":"HHOMR: a hybrid high-order moment residual model for miRNA-disease association prediction.","authors":"Zhengwei Li, Lipeng Wan, Lei Wang, Wenjing Wang, Ru Nie","doi":"10.1093/bib/bbae412","DOIUrl":"10.1093/bib/bbae412","url":null,"abstract":"<p><p>Numerous studies have demonstrated that microRNAs (miRNAs) are critically important for the prediction, diagnosis, and characterization of diseases. However, identifying miRNA-disease associations through traditional biological experiments is both costly and time-consuming. To further explore these associations, we proposed a model based on hybrid high-order moments combined with element-level attention mechanisms (HHOMR). This model innovatively fused hybrid higher-order statistical information along with structural and community information. Specifically, we first constructed a heterogeneous graph based on existing associations between miRNAs and diseases. HHOMR employs a structural fusion layer to capture structure-level embeddings and leverages a hybrid high-order moments encoder layer to enhance features. Element-level attention mechanisms are then used to adaptively integrate the features of these hybrid moments. Finally, a multi-layer perceptron is utilized to calculate the association scores between miRNAs and diseases. Through five-fold cross-validation on HMDD v2.0, we achieved a mean AUC of 93.28%. Compared with four state-of-the-art models, HHOMR exhibited superior performance. Additionally, case studies on three diseases-esophageal neoplasms, lymphoma, and prostate neoplasms-were conducted. Among the top 50 miRNAs with high disease association scores, 46, 47, and 45 associated with these diseases were confirmed by the dbDEMC and miR2Disease databases, respectively. Our results demonstrate that HHOMR not only outperforms existing models but also shows significant potential in predicting miRNA-disease associations.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11341279/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ReCIDE: robust estimation of cell type proportions by integrating single-reference-based deconvolutions. ReCIDE:通过整合基于单参考的解卷积,对细胞类型比例进行稳健估计。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae422
Minghan Li, Yuqing Su, Yanbo Gao, Weidong Tian

In this study, we introduce Robust estimation of Cell type proportions by Integrating single-reference-based DEconvolutions (ReCIDE), an innovative framework for robust estimation of cell type proportions by integrating single-reference-based deconvolutions. ReCIDE outperforms existing approaches in benchmark and real datasets, particularly excelling in estimating rare cell type proportions. Through exploratory analysis on public bulk data of triple-negative breast cancer (TNBC) patients using ReCIDE, we demonstrate a significant correlation between the prognosis of TNBC patients and the proportions of both T cell and perivascular-like cell subtypes. Built upon this discovery, we develop a prognostic assessment model for TNBC patients. Our contribution presents a novel framework for enhancing deconvolution accuracy, showcasing its effectiveness in medical research.

在这项研究中,我们介绍了通过整合基于单参考的解旋来稳健估计细胞类型比例的创新框架--ReCIDE(Robust estimation of Cell type proportions by Integrating single-reference-based DEconvolutions)。ReCIDE 在基准数据集和真实数据集中的表现优于现有方法,尤其是在估计罕见细胞类型比例方面。通过使用 ReCIDE 对三阴性乳腺癌(TNBC)患者的公开批量数据进行探索性分析,我们证明 TNBC 患者的预后与 T 细胞和血管周围样细胞亚型的比例之间存在显著的相关性。基于这一发现,我们建立了 TNBC 患者预后评估模型。我们的贡献是提出了一个提高解卷积准确性的新框架,展示了其在医学研究中的有效性。
{"title":"ReCIDE: robust estimation of cell type proportions by integrating single-reference-based deconvolutions.","authors":"Minghan Li, Yuqing Su, Yanbo Gao, Weidong Tian","doi":"10.1093/bib/bbae422","DOIUrl":"10.1093/bib/bbae422","url":null,"abstract":"<p><p>In this study, we introduce Robust estimation of Cell type proportions by Integrating single-reference-based DEconvolutions (ReCIDE), an innovative framework for robust estimation of cell type proportions by integrating single-reference-based deconvolutions. ReCIDE outperforms existing approaches in benchmark and real datasets, particularly excelling in estimating rare cell type proportions. Through exploratory analysis on public bulk data of triple-negative breast cancer (TNBC) patients using ReCIDE, we demonstrate a significant correlation between the prognosis of TNBC patients and the proportions of both T cell and perivascular-like cell subtypes. Built upon this discovery, we develop a prognostic assessment model for TNBC patients. Our contribution presents a novel framework for enhancing deconvolution accuracy, showcasing its effectiveness in medical research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11342246/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling hidden connections in omics data via pyPARAGON: an integrative hybrid approach for disease network construction. 通过 pyPARAGON 揭示 omics 数据中的隐藏联系:构建疾病网络的综合混合方法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae399
Muslum Kaan Arici, Nurcan Tuncbag

Network inference or reconstruction algorithms play an integral role in successfully analyzing and identifying causal relationships between omics hits for detecting dysregulated and altered signaling components in various contexts, encompassing disease states and drug perturbations. However, accurate representation of signaling networks and identification of context-specific interactions within sparse omics datasets in complex interactomes pose significant challenges in integrative approaches. To address these challenges, we present pyPARAGON (PAgeRAnk-flux on Graphlet-guided network for multi-Omic data integratioN), a novel tool that combines network propagation with graphlets. pyPARAGON enhances accuracy and minimizes the inclusion of nonspecific interactions in signaling networks by utilizing network rather than relying on pairwise connections among proteins. Through comprehensive evaluations on benchmark signaling pathways, we demonstrate that pyPARAGON outperforms state-of-the-art approaches in node propagation and edge inference. Furthermore, pyPARAGON exhibits promising performance in discovering cancer driver networks. Notably, we demonstrate its utility in network-based stratification of patient tumors by integrating phosphoproteomic data from 105 breast cancer tumors with the interactome and demonstrating tumor-specific signaling pathways. Overall, pyPARAGON is a novel tool for analyzing and integrating multi-omic data in the context of signaling networks. pyPARAGON is available at https://github.com/netlab-ku/pyPARAGON.

网络推断或重构算法在成功分析和识别omics命中之间的因果关系方面发挥着不可或缺的作用,这些命中用于检测各种情况下信号成分的失调和改变,包括疾病状态和药物扰动。然而,在复杂的相互作用组中,信号转导网络的准确表征和稀疏组学数据集中特定上下文相互作用的识别给整合方法带来了巨大挑战。为了应对这些挑战,我们推出了 pyPARAGON(PAgeRAnk-flux on Graphlet-guided network for multi-Omic data integratioN),这是一种将网络传播与小图相结合的新型工具。通过对基准信号通路的全面评估,我们证明了 pyPARAGON 在节点传播和边缘推断方面优于最先进的方法。此外,pyPARAGON 在发现癌症驱动网络方面也表现出了良好的性能。值得注意的是,我们通过整合 105 个乳腺癌肿瘤的磷酸化蛋白质组数据和相互作用组,证明了 pyPARAGON 在基于网络的患者肿瘤分层中的实用性,并展示了肿瘤特异性信号通路。总之,pyPARAGON 是在信号网络背景下分析和整合多组学数据的新型工具。pyPARAGON 可在 https://github.com/netlab-ku/pyPARAGON 上查阅。
{"title":"Unveiling hidden connections in omics data via pyPARAGON: an integrative hybrid approach for disease network construction.","authors":"Muslum Kaan Arici, Nurcan Tuncbag","doi":"10.1093/bib/bbae399","DOIUrl":"10.1093/bib/bbae399","url":null,"abstract":"<p><p>Network inference or reconstruction algorithms play an integral role in successfully analyzing and identifying causal relationships between omics hits for detecting dysregulated and altered signaling components in various contexts, encompassing disease states and drug perturbations. However, accurate representation of signaling networks and identification of context-specific interactions within sparse omics datasets in complex interactomes pose significant challenges in integrative approaches. To address these challenges, we present pyPARAGON (PAgeRAnk-flux on Graphlet-guided network for multi-Omic data integratioN), a novel tool that combines network propagation with graphlets. pyPARAGON enhances accuracy and minimizes the inclusion of nonspecific interactions in signaling networks by utilizing network rather than relying on pairwise connections among proteins. Through comprehensive evaluations on benchmark signaling pathways, we demonstrate that pyPARAGON outperforms state-of-the-art approaches in node propagation and edge inference. Furthermore, pyPARAGON exhibits promising performance in discovering cancer driver networks. Notably, we demonstrate its utility in network-based stratification of patient tumors by integrating phosphoproteomic data from 105 breast cancer tumors with the interactome and demonstrating tumor-specific signaling pathways. Overall, pyPARAGON is a novel tool for analyzing and integrating multi-omic data in the context of signaling networks. pyPARAGON is available at https://github.com/netlab-ku/pyPARAGON.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11334722/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142008367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting microsatellite instability by length comparison of microsatellites in the 3' untranslated region with RNA-seq. 利用 RNA-seq 对 3' 非翻译区的微卫星进行长度比较,检测微卫星的不稳定性。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-25 DOI: 10.1093/bib/bbae423
Jin-Wook Choi, Jin-Ok Lee, Sejoon Lee

Microsatellite instability (MSI), a phenomenon caused by deoxyribonucleic acid (DNA) mismatch repair system deficiencies, is an important biomarker in cancer research and clinical diagnostics. MSI detection often involves next-generation sequencing data, with many studies focusing on DNA. Here, we introduce a novel approach by measuring microsatellite lengths directly from ribonucleic acid sequencing (RNA-seq) data and comparing its distribution to detect MSI. Our findings reveal distinct instability patterns between MSI-high (MSI-H) and microsatellite stable samples, indicating the efficacy of RNA-based MSI detection. Additionally, microsatellites in the 3'-untranslated regions showed the greatest predictive value for MSI detection. Notably, this efficacy extends to detecting MSI-H samples even in tumors not commonly associated with MSI. Our approach highlights the utility of RNA-seq data in MSI detection, facilitating more precise diagnostics through the integration of various biological data.

微卫星不稳定性(MSI)是一种由脱氧核糖核酸(DNA)错配修复系统缺陷引起的现象,是癌症研究和临床诊断中的重要生物标志物。MSI 检测通常涉及下一代测序数据,许多研究侧重于 DNA。在这里,我们引入了一种新方法,直接从核糖核酸测序(RNA-seq)数据中测量微卫星长度,并比较其分布以检测 MSI。我们的发现揭示了MSI-高(MSI-H)和微卫星稳定样本之间不同的不稳定性模式,表明基于RNA的MSI检测是有效的。此外,3'-非翻译区的微卫星对 MSI 检测具有最大的预测价值。值得注意的是,即使在不常伴有MSI的肿瘤中,这种功效也能延伸到检测MSI-H样本。我们的方法凸显了 RNA-seq 数据在 MSI 检测中的作用,通过整合各种生物数据促进了更精确的诊断。
{"title":"Detecting microsatellite instability by length comparison of microsatellites in the 3' untranslated region with RNA-seq.","authors":"Jin-Wook Choi, Jin-Ok Lee, Sejoon Lee","doi":"10.1093/bib/bbae423","DOIUrl":"10.1093/bib/bbae423","url":null,"abstract":"<p><p>Microsatellite instability (MSI), a phenomenon caused by deoxyribonucleic acid (DNA) mismatch repair system deficiencies, is an important biomarker in cancer research and clinical diagnostics. MSI detection often involves next-generation sequencing data, with many studies focusing on DNA. Here, we introduce a novel approach by measuring microsatellite lengths directly from ribonucleic acid sequencing (RNA-seq) data and comparing its distribution to detect MSI. Our findings reveal distinct instability patterns between MSI-high (MSI-H) and microsatellite stable samples, indicating the efficacy of RNA-based MSI detection. Additionally, microsatellites in the 3'-untranslated regions showed the greatest predictive value for MSI detection. Notably, this efficacy extends to detecting MSI-H samples even in tumors not commonly associated with MSI. Our approach highlights the utility of RNA-seq data in MSI detection, facilitating more precise diagnostics through the integration of various biological data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11361843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142104462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1