Statistical methodology for ribosomal frameshift detection

Alisa Yurovsky, Justin Gardin, B. Futcher, S. Skiena
{"title":"Statistical methodology for ribosomal frameshift detection","authors":"Alisa Yurovsky, Justin Gardin, B. Futcher, S. Skiena","doi":"10.1145/3535508.3545529","DOIUrl":null,"url":null,"abstract":"During normal protein synthesis, the ribosome shifts along the messenger RNA (mRNA) by exactly three nucleotides for each amino acid added to the protein being translated. However, in special cases, the sequence of the mRNA somehow induces the ribosome to slip, which shifts the \"reading frame\" in which the mRNA is translated, and gives rise to an otherwise unexpected protein. Such \"programmed frameshifts\" are well-known in viruses, including coronavirus, and a few cases of programmed frameshifting are also known in cellular genes. However, there is no good way, either experimental or informatic, to identify novel cases of programmed frameshifting. Thus it is possible that substantial numbers of cellular proteins generated by programmed frameshifting in human and other organisms remain unknown. Here, we build on prior works observing that data from ribosome profiling can be analyzed for anomalies in mRNA reading frame periodicity to identify putative programmed frameshifts. We develop a statistical framework to identify all likely (even for very low frameshifting rates) frameshift positions in a genome. We also develop a frameshift simulator for ribosome profiling data to verify our algorithm. We show high sensitivity of prediction on the simulated data, retrieving 97.4% of the simulated frameshifts. Furthermore, our method found all three of the known yeast genes with programmed frameshifts. Our results suggest there could be a large number of un-annotated alternative proteins in the yeast genome, generated by programmed frameshifting. This motivates further study and parallel investigations in the human genome.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3535508.3545529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

During normal protein synthesis, the ribosome shifts along the messenger RNA (mRNA) by exactly three nucleotides for each amino acid added to the protein being translated. However, in special cases, the sequence of the mRNA somehow induces the ribosome to slip, which shifts the "reading frame" in which the mRNA is translated, and gives rise to an otherwise unexpected protein. Such "programmed frameshifts" are well-known in viruses, including coronavirus, and a few cases of programmed frameshifting are also known in cellular genes. However, there is no good way, either experimental or informatic, to identify novel cases of programmed frameshifting. Thus it is possible that substantial numbers of cellular proteins generated by programmed frameshifting in human and other organisms remain unknown. Here, we build on prior works observing that data from ribosome profiling can be analyzed for anomalies in mRNA reading frame periodicity to identify putative programmed frameshifts. We develop a statistical framework to identify all likely (even for very low frameshifting rates) frameshift positions in a genome. We also develop a frameshift simulator for ribosome profiling data to verify our algorithm. We show high sensitivity of prediction on the simulated data, retrieving 97.4% of the simulated frameshifts. Furthermore, our method found all three of the known yeast genes with programmed frameshifts. Our results suggest there could be a large number of un-annotated alternative proteins in the yeast genome, generated by programmed frameshifting. This motivates further study and parallel investigations in the human genome.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
核糖体移码检测的统计方法
在正常的蛋白质合成过程中,核糖体沿着信使RNA (mRNA)移动,对于每添加到被翻译蛋白质上的氨基酸,核糖体精确地移动三个核苷酸。然而,在特殊情况下,mRNA的序列会以某种方式诱导核糖体滑动,从而改变mRNA被翻译的“阅读框”,从而产生一种意想不到的蛋白质。这种“程序性移帧”在包括冠状病毒在内的病毒中是众所周知的,在细胞基因中也有一些程序性移帧的案例。然而,没有好的方法,无论是实验或信息,以确定新的情况下的程序化帧移。因此,在人类和其他生物体中,通过程序化移框产生的大量细胞蛋白可能仍然未知。在此,我们建立在先前的工作基础上,观察到核糖体分析的数据可以分析mRNA阅读框周期性的异常,以识别假定的程序化帧移位。我们开发了一个统计框架来识别基因组中所有可能的移码位置(即使移码率非常低)。我们还为核糖体分析数据开发了移码模拟器来验证我们的算法。我们对模拟数据的预测具有很高的灵敏度,检索了97.4%的模拟帧移。此外,我们的方法发现了所有三个已知的酵母基因具有程序化的帧移。我们的研究结果表明,酵母基因组中可能存在大量未注释的替代蛋白,这些蛋白是通过程序化的移框产生的。这激发了对人类基因组的进一步研究和平行研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Examining post-pandemic behaviors influencing human mobility trends Geographic ensembles of observations using randomised ensembles of autoregression chains: ensemble methods for spatio-temporal time series forecasting of influenza-like illness Trajectory-based and sound-based medical data clustering Session details: Graphs & networks TopographyNET
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1