通过蛋白质的子序列组成探测蛋白质的随机性

A. Apostolico, F. Cunial
{"title":"通过蛋白质的子序列组成探测蛋白质的随机性","authors":"A. Apostolico, F. Cunial","doi":"10.1109/DCC.2009.60","DOIUrl":null,"url":null,"abstract":"The  quantitative underpinning  of the information contents of biosequences represents an elusive goal and yet also  an obvious prerequisite to the quantitative  modeling and study of biological  function and evolution. Previous studies have consistently exposed a tenacious lack of compressibility on behalf of biosequences.  This leaves the question open as to what distinguishes them from random strings, the latter being clearly  unpalatable to the living cell. This paper assesses the randomness of biosequences in terms on newly introduced parameters that relate to the vocabulary of their  (suitably constrained)  subsequences rather than their substrings. Results from  experiments show the potential of the method in distinguishing a protein sequence from its random reshuffling, as well as in tasks of classification and clustering.","PeriodicalId":377880,"journal":{"name":"2009 Data Compression Conference","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Probing the Randomness of Proteins by Their Subsequence Composition\",\"authors\":\"A. Apostolico, F. Cunial\",\"doi\":\"10.1109/DCC.2009.60\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The  quantitative underpinning  of the information contents of biosequences represents an elusive goal and yet also  an obvious prerequisite to the quantitative  modeling and study of biological  function and evolution. Previous studies have consistently exposed a tenacious lack of compressibility on behalf of biosequences.  This leaves the question open as to what distinguishes them from random strings, the latter being clearly  unpalatable to the living cell. This paper assesses the randomness of biosequences in terms on newly introduced parameters that relate to the vocabulary of their  (suitably constrained)  subsequences rather than their substrings. Results from  experiments show the potential of the method in distinguishing a protein sequence from its random reshuffling, as well as in tasks of classification and clustering.\",\"PeriodicalId\":377880,\"journal\":{\"name\":\"2009 Data Compression Conference\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-03-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.2009.60\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2009.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

生物序列信息内容的定量支撑是一个难以实现的目标,但也是生物功能和进化定量建模和研究的一个明显的先决条件。先前的研究一致地暴露了生物序列的可压缩性的顽固缺乏。这就留下了一个悬而未决的问题,即是什么将它们与随机字符串区分开来,后者显然不适合活细胞。本文根据新引入的参数来评估生物序列的随机性,这些参数与它们的(适当约束的)子序列的词汇有关,而不是它们的子串。实验结果表明,该方法在区分随机重组的蛋白质序列以及分类和聚类任务方面具有潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Probing the Randomness of Proteins by Their Subsequence Composition
The  quantitative underpinning  of the information contents of biosequences represents an elusive goal and yet also  an obvious prerequisite to the quantitative  modeling and study of biological  function and evolution. Previous studies have consistently exposed a tenacious lack of compressibility on behalf of biosequences.  This leaves the question open as to what distinguishes them from random strings, the latter being clearly  unpalatable to the living cell. This paper assesses the randomness of biosequences in terms on newly introduced parameters that relate to the vocabulary of their  (suitably constrained)  subsequences rather than their substrings. Results from  experiments show the potential of the method in distinguishing a protein sequence from its random reshuffling, as well as in tasks of classification and clustering.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analog Joint Source Channel Coding Using Space-Filling Curves and MMSE Decoding Tree Histogram Coding for Mobile Image Matching Clustered Reversible-KLT for Progressive Lossy-to-Lossless 3d Image Coding Optimized Source-Channel Coding of Video Signals in Packet Loss Environments New Families and New Members of Integer Sequence Based Coding Methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1