基于时间和空间的全基因组稀疏逆转录元素比对和可视化算法

2011 Second International Conference on Intelligent Systems, Modelling and Simulation Pub Date : 2011-01-25 DOI:10.1109/ISMS.2011.29

Sun-young Park, Seon-Yeong Kim, Dong-Sung Ryu, Hwan-Gue Cho

{"title":"基于时间和空间的全基因组稀疏逆转录元素比对和可视化算法","authors":"Sun-young Park, Seon-Yeong Kim, Dong-Sung Ryu, Hwan-Gue Cho","doi":"10.1109/ISMS.2011.29","DOIUrl":null,"url":null,"abstract":"It is not easy to align DNA sequences of whole genomes to find some useful biomarkers (in general term), as whole genome has more than 10 mega bases. Thus, we do not apply the straightforward Smith-Waterman O(N^2) time and space algorithms. If some specific DNA segments in which we are interested are very sparse, then the O(n^2) dynamic programming algorithm for aligning those are inefficient since that algorithm depends on the total length of the input string rather than the number of markers appearing on it. When we consider the retro-element over the whole genome, this problem becomes very clear since the size of whole genome is much more than a few (about 10-20) retro-elements we want to compare. In this paper we propose another alignment problem that consists of only two kinds of symbols, retro-elements and other general symbols. Thus, the whole genome can be regarded as a binary string of '1' (retroelement) and '0' (others). We propose an alignment algorithm for simplified binary string to reveal the structural similarity of retro-elements over two different genomes. The time complexity of this algorithm is O(M^2), where M denotes the number of retro-elements appearing in the genomes. We studied structural similarities of all HERV(Human Endogenous RetroViruses) element of four typical primates including Human, Chimpanzees, Orangutan and Rhesus monkey to show the usefulness of our algorithm. Our system successfully revealed the similarity distribution of retro elements for the four primates.","PeriodicalId":193599,"journal":{"name":"2011 Second International Conference on Intelligent Systems, Modelling and Simulation","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Time and Space Efficient Algorithm for Alignment and Visualization of Sparse Retro-elements over Whole Genome Scale\",\"authors\":\"Sun-young Park, Seon-Yeong Kim, Dong-Sung Ryu, Hwan-Gue Cho\",\"doi\":\"10.1109/ISMS.2011.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is not easy to align DNA sequences of whole genomes to find some useful biomarkers (in general term), as whole genome has more than 10 mega bases. Thus, we do not apply the straightforward Smith-Waterman O(N^2) time and space algorithms. If some specific DNA segments in which we are interested are very sparse, then the O(n^2) dynamic programming algorithm for aligning those are inefficient since that algorithm depends on the total length of the input string rather than the number of markers appearing on it. When we consider the retro-element over the whole genome, this problem becomes very clear since the size of whole genome is much more than a few (about 10-20) retro-elements we want to compare. In this paper we propose another alignment problem that consists of only two kinds of symbols, retro-elements and other general symbols. Thus, the whole genome can be regarded as a binary string of '1' (retroelement) and '0' (others). We propose an alignment algorithm for simplified binary string to reveal the structural similarity of retro-elements over two different genomes. The time complexity of this algorithm is O(M^2), where M denotes the number of retro-elements appearing in the genomes. We studied structural similarities of all HERV(Human Endogenous RetroViruses) element of four typical primates including Human, Chimpanzees, Orangutan and Rhesus monkey to show the usefulness of our algorithm. Our system successfully revealed the similarity distribution of retro elements for the four primates.\",\"PeriodicalId\":193599,\"journal\":{\"name\":\"2011 Second International Conference on Intelligent Systems, Modelling and Simulation\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Second International Conference on Intelligent Systems, Modelling and Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISMS.2011.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Second International Conference on Intelligent Systems, Modelling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISMS.2011.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于整个基因组有超过10个兆碱基，因此对整个基因组的DNA序列进行比对以发现一些有用的生物标志物(一般来说)并不容易。因此，我们不应用直接的Smith-Waterman O(N^2)时间和空间算法。如果我们感兴趣的某些特定DNA片段非常稀疏，那么用于对齐这些片段的O(n^2)动态规划算法是低效的，因为该算法依赖于输入字符串的总长度，而不是出现在其中的标记的数量。当我们在整个基因组中考虑逆转录因子时，这个问题变得非常清楚，因为整个基因组的大小远远超过我们想要比较的几个(大约10-20个)逆转录因子。在本文中，我们提出了另一种仅由两种符号组成的对齐问题，即复古元素和其他一般符号。因此，整个基因组可以被看作是“1”(逆转录因子)和“0”(其他)的二进制字符串。我们提出了一种简化二进制字符串的比对算法，以揭示两个不同基因组上逆转录元素的结构相似性。该算法的时间复杂度为O(M^2)，其中M表示基因组中出现的逆转录元素的数量。我们研究了人类、黑猩猩、猩猩和恒河猴四种典型灵长类动物的所有HERV(Human Endogenous retrovirus，人类内源性逆转录病毒)元素的结构相似性，以证明我们的算法的有效性。我们的系统成功地揭示了四种灵长类动物的复古元素的相似性分布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Time and Space Efficient Algorithm for Alignment and Visualization of Sparse Retro-elements over Whole Genome Scale

It is not easy to align DNA sequences of whole genomes to find some useful biomarkers (in general term), as whole genome has more than 10 mega bases. Thus, we do not apply the straightforward Smith-Waterman O(N^2) time and space algorithms. If some specific DNA segments in which we are interested are very sparse, then the O(n^2) dynamic programming algorithm for aligning those are inefficient since that algorithm depends on the total length of the input string rather than the number of markers appearing on it. When we consider the retro-element over the whole genome, this problem becomes very clear since the size of whole genome is much more than a few (about 10-20) retro-elements we want to compare. In this paper we propose another alignment problem that consists of only two kinds of symbols, retro-elements and other general symbols. Thus, the whole genome can be regarded as a binary string of '1' (retroelement) and '0' (others). We propose an alignment algorithm for simplified binary string to reveal the structural similarity of retro-elements over two different genomes. The time complexity of this algorithm is O(M^2), where M denotes the number of retro-elements appearing in the genomes. We studied structural similarities of all HERV(Human Endogenous RetroViruses) element of four typical primates including Human, Chimpanzees, Orangutan and Rhesus monkey to show the usefulness of our algorithm. Our system successfully revealed the similarity distribution of retro elements for the four primates.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 Second International Conference on Intelligent Systems, Modelling and Simulation

自引率

0.00%

发文量