Vipin Thomas, Navya Raj, Deepthi Varughese, Naveen Kumar, Seema Sehrawat, Abhinav Grover, Shailja Singh, Pawan K Dhar, Achuthsankar S Nair
{"title":"Predicting stable functional peptides from the intergenic space of <i>E. coli</i>.","authors":"Vipin Thomas, Navya Raj, Deepthi Varughese, Naveen Kumar, Seema Sehrawat, Abhinav Grover, Shailja Singh, Pawan K Dhar, Achuthsankar S Nair","doi":"10.1007/s11693-015-9172-z","DOIUrl":null,"url":null,"abstract":"<p><p>Expression of synthetic proteins from intergenic regions of <i>E. coli</i> and their functional association was recently demonstrated (Dhar et al. in J Biol Eng 3:2, 2009. doi:10.1186/1754-1611-3-2). This gave birth to the question: if one can make 'user-defined' genes from non-coding genome-how big is the artificially translatable genome? (Dinger et al. in PLoS Comput Biol 4, 2008; Frith et al. in RNA Biol 3(1):40-48, 2006a; Frith et al. in PLoS Genet 2(4):e52, 2006b). To answer this question, we performed a bioinformatics study of all reported <i>E. coli</i> intergenic sequences, in search of novel peptides and proteins, unexpressed by nature. Overall, 2500 <i>E. coli</i> intergenic sequences were computationally translated into 'protein sequence equivalents' and matched against all known proteins. Sequences that did not show any resemblance were used for building a comprehensive profile in terms of their structure, function, localization, interactions, stability so on. A total of 362 protein sequences showed evidence of stable tertiary conformations encoded by the intergenic sequences of <i>E. coli</i> genome. Experimental studies are underway to confirm some of the key predictions. This study points to a vast untapped repository of functional molecules lying undiscovered in the non-expressed genome of various organisms.</p>","PeriodicalId":22161,"journal":{"name":"Systems and Synthetic Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s11693-015-9172-z","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems and Synthetic Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11693-015-9172-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2015/5/29 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Expression of synthetic proteins from intergenic regions of E. coli and their functional association was recently demonstrated (Dhar et al. in J Biol Eng 3:2, 2009. doi:10.1186/1754-1611-3-2). This gave birth to the question: if one can make 'user-defined' genes from non-coding genome-how big is the artificially translatable genome? (Dinger et al. in PLoS Comput Biol 4, 2008; Frith et al. in RNA Biol 3(1):40-48, 2006a; Frith et al. in PLoS Genet 2(4):e52, 2006b). To answer this question, we performed a bioinformatics study of all reported E. coli intergenic sequences, in search of novel peptides and proteins, unexpressed by nature. Overall, 2500 E. coli intergenic sequences were computationally translated into 'protein sequence equivalents' and matched against all known proteins. Sequences that did not show any resemblance were used for building a comprehensive profile in terms of their structure, function, localization, interactions, stability so on. A total of 362 protein sequences showed evidence of stable tertiary conformations encoded by the intergenic sequences of E. coli genome. Experimental studies are underway to confirm some of the key predictions. This study points to a vast untapped repository of functional molecules lying undiscovered in the non-expressed genome of various organisms.
大肠杆菌基因间区合成蛋白的表达及其功能关联最近得到证实(Dhar et al. journal of biological engineering, 2009)。doi: 10.1186 / 1754-1611-3-2)。这就产生了一个问题:如果人们可以从非编码基因组中制造出“用户定义的”基因,那么人工可翻译的基因组有多大?(Dinger et al. PLoS computational Biol 4, 2008;Frith et al.中国生物医学工程学报(英文版);Frith et al., PLoS,基因2(4):e52, 2006b)。为了回答这个问题,我们对所有报道的大肠杆菌基因间序列进行了生物信息学研究,以寻找自然界未表达的新肽和蛋白质。总的来说,2500个大肠杆菌基因间序列被计算翻译成“蛋白质序列当量”,并与所有已知蛋白质匹配。利用没有任何相似性的序列,从结构、功能、定位、相互作用、稳定性等方面建立全面的图谱。共有362个蛋白质序列显示由大肠杆菌基因组基因间序列编码的稳定三级构象。实验研究正在进行,以证实一些关键的预测。这项研究指出,在各种生物体的非表达基因组中,有大量未开发的功能分子尚未被发现。