Jianhong Ou, Haibo Liu, Sungmi Park, Michael R Green, Lihua Julie Zhu
{"title":"InPAS:从大量 RNA-seq 数据中识别新型多腺苷酸化位点和替代多腺苷酸化的 R/Bioconductor 软件包。","authors":"Jianhong Ou, Haibo Liu, Sungmi Park, Michael R Green, Lihua Julie Zhu","doi":"10.31083/j.fbs1604021","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Alternative cleavage and polyadenylation (APA) is a crucial post-transcriptional gene regulation mechanism that regulates gene expression in eukaryotes by increasing the diversity and complexity of both the transcriptome and proteome. Despite the development of more than a dozen experimental methods over the last decade to identify and quantify APA events, widespread adoption of these methods has been limited by technical, financial, and time constraints. Consequently, APA remains poorly understood in most eukaryotes. However, RNA sequencing (RNA-seq) technology has revolutionized transcriptome profiling and recent studies have shown that RNA-seq data can be leveraged to identify and quantify APA events.</p><p><strong>Results: </strong>To fully capitalize on the exponentially growing RNA-seq data, we developed InPAS (Identification of Novel alternative PolyAdenylation Sites), an R/Bioconductor package for accurate identification of novel and known cleavage and polyadenylation sites (CPSs), as well as quantification of APA from RNA-seq data of various experimental designs. Compared to other APA analysis tools, InPAS offers several important advantages, including the ability to detect both novel proximal and distal CPSs, to fine tune positions of CPSs using a naïve Bayes classifier based on flanking sequence features, and to identify APA events from RNA-seq data of complex experimental designs using linear models. We benchmarked the performance of InPAS and other leading tools using simulated and experimental RNA-seq data with matched 3'-end RNA-seq data. Our results reveal that InPAS frequently outperforms existing tools in terms of precision, sensitivity, and specificity. Furthermore, we demonstrate its scalability and versatility by applying it to large, diverse RNA-seq datasets.</p><p><strong>Conclusions: </strong>InPAS is an efficient and robust tool for identifying and quantifying APA events using readily accessible conventional RNA-seq data. Its versatility opens doors to explore APA regulation across diverse eukaryotic systems with various experimental designs. We believe that InPAS will drive APA research forward, deepening our understanding of its role in regulating gene expression, and potentially leading to the discovery of biomarkers or therapeutics for diseases.</p>","PeriodicalId":73070,"journal":{"name":"Frontiers in bioscience (Scholar edition)","volume":"16 4","pages":"21"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"InPAS: An R/Bioconductor Package for Identifying Novel Polyadenylation Sites and Alternative Polyadenylation from Bulk RNA-seq Data.\",\"authors\":\"Jianhong Ou, Haibo Liu, Sungmi Park, Michael R Green, Lihua Julie Zhu\",\"doi\":\"10.31083/j.fbs1604021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Alternative cleavage and polyadenylation (APA) is a crucial post-transcriptional gene regulation mechanism that regulates gene expression in eukaryotes by increasing the diversity and complexity of both the transcriptome and proteome. Despite the development of more than a dozen experimental methods over the last decade to identify and quantify APA events, widespread adoption of these methods has been limited by technical, financial, and time constraints. Consequently, APA remains poorly understood in most eukaryotes. However, RNA sequencing (RNA-seq) technology has revolutionized transcriptome profiling and recent studies have shown that RNA-seq data can be leveraged to identify and quantify APA events.</p><p><strong>Results: </strong>To fully capitalize on the exponentially growing RNA-seq data, we developed InPAS (Identification of Novel alternative PolyAdenylation Sites), an R/Bioconductor package for accurate identification of novel and known cleavage and polyadenylation sites (CPSs), as well as quantification of APA from RNA-seq data of various experimental designs. Compared to other APA analysis tools, InPAS offers several important advantages, including the ability to detect both novel proximal and distal CPSs, to fine tune positions of CPSs using a naïve Bayes classifier based on flanking sequence features, and to identify APA events from RNA-seq data of complex experimental designs using linear models. We benchmarked the performance of InPAS and other leading tools using simulated and experimental RNA-seq data with matched 3'-end RNA-seq data. Our results reveal that InPAS frequently outperforms existing tools in terms of precision, sensitivity, and specificity. Furthermore, we demonstrate its scalability and versatility by applying it to large, diverse RNA-seq datasets.</p><p><strong>Conclusions: </strong>InPAS is an efficient and robust tool for identifying and quantifying APA events using readily accessible conventional RNA-seq data. Its versatility opens doors to explore APA regulation across diverse eukaryotic systems with various experimental designs. We believe that InPAS will drive APA research forward, deepening our understanding of its role in regulating gene expression, and potentially leading to the discovery of biomarkers or therapeutics for diseases.</p>\",\"PeriodicalId\":73070,\"journal\":{\"name\":\"Frontiers in bioscience (Scholar edition)\",\"volume\":\"16 4\",\"pages\":\"21\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in bioscience (Scholar edition)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31083/j.fbs1604021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioscience (Scholar edition)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31083/j.fbs1604021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
背景:选择性切割和多聚腺苷化(APA)是一种重要的转录后基因调控机制,通过增加转录组和蛋白质组的多样性和复杂性来调节真核生物的基因表达。尽管在过去的十年中发展了十几种实验方法来识别和量化APA事件,但这些方法的广泛采用受到技术、资金和时间的限制。因此,在大多数真核生物中,对APA的了解仍然很少。然而,RNA测序(RNA-seq)技术已经彻底改变了转录组分析,最近的研究表明,RNA-seq数据可以用来识别和量化APA事件。结果:为了充分利用呈指数增长的RNA-seq数据,我们开发了InPAS (Identification of Novel alternative PolyAdenylation Sites),这是一个R/Bioconductor软件包,用于准确鉴定新的和已知的切割和聚腺苷化位点(cps),以及从各种实验设计的RNA-seq数据中定量APA。与其他APA分析工具相比,InPAS具有几个重要的优势,包括能够检测新的近端和远端cps,使用基于侧翼序列特征的naïve贝叶斯分类器微调cps的位置,以及使用线性模型从复杂实验设计的RNA-seq数据中识别APA事件。我们使用模拟和实验RNA-seq数据与匹配的3'端RNA-seq数据对InPAS和其他领先工具的性能进行基准测试。我们的研究结果表明,InPAS在精度、灵敏度和特异性方面经常优于现有的工具。此外,我们通过将其应用于大型,不同的RNA-seq数据集来证明其可扩展性和多功能性。结论:使用易于获取的传统RNA-seq数据,InPAS是识别和量化APA事件的有效且强大的工具。它的多功能性打开了探索APA调节跨不同的真核系统与各种实验设计的大门。我们相信,InPAS将推动APA研究向前发展,加深我们对其在调节基因表达中的作用的理解,并可能导致发现生物标志物或疾病治疗方法。
InPAS: An R/Bioconductor Package for Identifying Novel Polyadenylation Sites and Alternative Polyadenylation from Bulk RNA-seq Data.
Background: Alternative cleavage and polyadenylation (APA) is a crucial post-transcriptional gene regulation mechanism that regulates gene expression in eukaryotes by increasing the diversity and complexity of both the transcriptome and proteome. Despite the development of more than a dozen experimental methods over the last decade to identify and quantify APA events, widespread adoption of these methods has been limited by technical, financial, and time constraints. Consequently, APA remains poorly understood in most eukaryotes. However, RNA sequencing (RNA-seq) technology has revolutionized transcriptome profiling and recent studies have shown that RNA-seq data can be leveraged to identify and quantify APA events.
Results: To fully capitalize on the exponentially growing RNA-seq data, we developed InPAS (Identification of Novel alternative PolyAdenylation Sites), an R/Bioconductor package for accurate identification of novel and known cleavage and polyadenylation sites (CPSs), as well as quantification of APA from RNA-seq data of various experimental designs. Compared to other APA analysis tools, InPAS offers several important advantages, including the ability to detect both novel proximal and distal CPSs, to fine tune positions of CPSs using a naïve Bayes classifier based on flanking sequence features, and to identify APA events from RNA-seq data of complex experimental designs using linear models. We benchmarked the performance of InPAS and other leading tools using simulated and experimental RNA-seq data with matched 3'-end RNA-seq data. Our results reveal that InPAS frequently outperforms existing tools in terms of precision, sensitivity, and specificity. Furthermore, we demonstrate its scalability and versatility by applying it to large, diverse RNA-seq datasets.
Conclusions: InPAS is an efficient and robust tool for identifying and quantifying APA events using readily accessible conventional RNA-seq data. Its versatility opens doors to explore APA regulation across diverse eukaryotic systems with various experimental designs. We believe that InPAS will drive APA research forward, deepening our understanding of its role in regulating gene expression, and potentially leading to the discovery of biomarkers or therapeutics for diseases.