A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq

IF 11.5 2区 生物学 Q1 GENETICS & HEREDITY Genomics, Proteomics & Bioinformatics Pub Date : 2023-02-01 DOI:10.1016/j.gpb.2022.09.005
Wenbin Ye , Qiwei Lian , Congting Ye , Xiaohui Wu
{"title":"A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq","authors":"Wenbin Ye ,&nbsp;Qiwei Lian ,&nbsp;Congting Ye ,&nbsp;Xiaohui Wu","doi":"10.1016/j.gpb.2022.09.005","DOIUrl":null,"url":null,"abstract":"<div><p>Alternative <strong>polyadenylation</strong> (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (<strong>RNA-seq</strong>) data, and single-cell RNA sequencing (<strong>scRNA-seq</strong>) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new <strong>machine learning</strong> and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3′ untranslated region, tissue-specific, cross-species, and single-cell pA prediction.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 1","pages":"Pages 67-83"},"PeriodicalIF":11.5000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/ff/97/main.PMC10372920.pdf","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, Proteomics & Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1672022922001218","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 6

Abstract

Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3′ untranslated region, tissue-specific, cross-species, and single-cell pA prediction.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从DNA序列、体RNA-seq和单细胞RNA-seq预测多聚腺苷酸化位点的方法综述
选择性多腺苷酸化(APA)在调节mRNA稳定性、翻译和亚细胞定位方面发挥着重要作用,并对形成真核转录组复杂性和蛋白质组多样性做出了广泛贡献。在全基因组范围内鉴定poly(A)位点(pAs)是理解APA介导的基因调控的潜在机制的关键一步。已经提出了许多已建立的计算工具来从不同的基因组数据预测pAs。在这里,我们详尽地概述了根据DNA序列、批量RNA测序(RNA-seq)数据和单细胞RNA测序(scRNA-seq)数据预测pAs的计算方法。特别是,我们使用来自外周血单核细胞的大量RNA-seq和scRNA-seq数据检查了几种具有代表性的工具,并就如何评估不同工具预测的pAs的可靠性提出了可操作的建议。我们还提出了关于选择适用于不同场景的适当方法的实用指南。此外,我们深入讨论了在提高pA预测性能和基准测试不同方法方面的挑战。此外,我们强调了使用新的机器学习和综合多组学技术的突出挑战和机遇,并就未来非3'非翻译区域、组织特异性、跨物种和单细胞pA预测的计算方法可能如何发展提供了我们的观点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Genomics, Proteomics & Bioinformatics
Genomics, Proteomics & Bioinformatics Biochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
14.30
自引率
4.20%
发文量
844
审稿时长
61 days
期刊介绍: Genomics, Proteomics and Bioinformatics (GPB) is the official journal of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China. It aims to disseminate new developments in the field of omics and bioinformatics, publish high-quality discoveries quickly, and promote open access and online publication. GPB welcomes submissions in all areas of life science, biology, and biomedicine, with a focus on large data acquisition, analysis, and curation. Manuscripts covering omics and related bioinformatics topics are particularly encouraged. GPB is indexed/abstracted by PubMed/MEDLINE, PubMed Central, Scopus, BIOSIS Previews, Chemical Abstracts, CSCD, among others.
期刊最新文献
Review and Evaluate the Bioinformatics Analysis Strategies of ATAC-seq and CUT&Tag Data. Identification of highly repetitive barley enhancers with long-range regulation potential via STARR-seq CpG island definition and methylation mapping of the T2T-YAO genome Pindel-TD: a tandem duplication detector based on a pattern growth approach SMARTdb: An Integrated Database for Exploring Single-cell Multi-omics Data of Reproductive Medicine
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1