{"title":"通过增加备选排列的概率寻找相关序列的简单方法","authors":"Martin C Frith","doi":"10.1101/gr.279464.124","DOIUrl":null,"url":null,"abstract":"The main way of analyzing genetic sequences is by finding sequence regions that are related to each other. There are many methods to do that, usually based on this idea: find an alignment of two sequence regions, which would be unlikely to exist between unrelated sequences. Unfortunately, it is hard to tell if an alignment is likely to exist by chance. Also, the precise alignment of related regions is uncertain. One alignment does not hold all evidence that they are related. We should consider alternative alignments too. This is rarely done, because we lack a simple and fast method that fits easily into practical sequence-search software. Here is described a simplest-conceivable change to standard sequence alignment, which sums probabilities of alternative alignments. This makes it easier to tell if a similarity is likely to occur by chance. This approach is better than standard alignment at finding distant relationships, at least in a few tests. It can be used in practical sequence-search software, with minimal increase in implementation difficulty or run time. It generalizes to different kinds of alignment, e.g. DNA-versus-protein with frameshifts. Thus, it can widely contribute to finding subtle relationships between sequences.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A simple method for finding related sequences by adding probabilities of alternative alignments\",\"authors\":\"Martin C Frith\",\"doi\":\"10.1101/gr.279464.124\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main way of analyzing genetic sequences is by finding sequence regions that are related to each other. There are many methods to do that, usually based on this idea: find an alignment of two sequence regions, which would be unlikely to exist between unrelated sequences. Unfortunately, it is hard to tell if an alignment is likely to exist by chance. Also, the precise alignment of related regions is uncertain. One alignment does not hold all evidence that they are related. We should consider alternative alignments too. This is rarely done, because we lack a simple and fast method that fits easily into practical sequence-search software. Here is described a simplest-conceivable change to standard sequence alignment, which sums probabilities of alternative alignments. This makes it easier to tell if a similarity is likely to occur by chance. This approach is better than standard alignment at finding distant relationships, at least in a few tests. It can be used in practical sequence-search software, with minimal increase in implementation difficulty or run time. It generalizes to different kinds of alignment, e.g. DNA-versus-protein with frameshifts. Thus, it can widely contribute to finding subtle relationships between sequences.\",\"PeriodicalId\":12678,\"journal\":{\"name\":\"Genome research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1101/gr.279464.124\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279464.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
分析基因序列的主要方法是找到相互关联的序列区域。有很多方法可以做到这一点,通常基于以下想法:找到两个序列区域的比对,而这两个序列区域不太可能存在于不相关的序列之间。遗憾的是,很难说对齐是否可能是偶然存在的。而且,相关区域的精确排列也不确定。一次排列并不能证明它们之间存在关联。我们还应该考虑其他的排列方式。我们很少这样做,因为我们缺乏一种简单而快速的方法,可以很容易地应用到实用的序列搜索软件中。这里描述的是对标准序列比对的一个最简单的可想象的改变,即对备选比对的概率进行求和。这样就能更容易地判断相似性是否可能是偶然出现的。至少在一些测试中,这种方法比标准比对更能发现遥远的关系。这种方法可用于实际的序列搜索软件中,而且实施难度和运行时间的增加极少。它适用于不同类型的比对,如带有框架转换的 DNA 与蛋白质比对。因此,它可以广泛用于发现序列之间的微妙关系。
A simple method for finding related sequences by adding probabilities of alternative alignments
The main way of analyzing genetic sequences is by finding sequence regions that are related to each other. There are many methods to do that, usually based on this idea: find an alignment of two sequence regions, which would be unlikely to exist between unrelated sequences. Unfortunately, it is hard to tell if an alignment is likely to exist by chance. Also, the precise alignment of related regions is uncertain. One alignment does not hold all evidence that they are related. We should consider alternative alignments too. This is rarely done, because we lack a simple and fast method that fits easily into practical sequence-search software. Here is described a simplest-conceivable change to standard sequence alignment, which sums probabilities of alternative alignments. This makes it easier to tell if a similarity is likely to occur by chance. This approach is better than standard alignment at finding distant relationships, at least in a few tests. It can be used in practical sequence-search software, with minimal increase in implementation difficulty or run time. It generalizes to different kinds of alignment, e.g. DNA-versus-protein with frameshifts. Thus, it can widely contribute to finding subtle relationships between sequences.
期刊介绍:
Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine.
Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies.
New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.