基于时间和空间的全基因组稀疏逆转录元素比对和可视化算法

Sun-young Park, Seon-Yeong Kim, Dong-Sung Ryu, Hwan-Gue Cho
{"title":"基于时间和空间的全基因组稀疏逆转录元素比对和可视化算法","authors":"Sun-young Park, Seon-Yeong Kim, Dong-Sung Ryu, Hwan-Gue Cho","doi":"10.1109/ISMS.2011.29","DOIUrl":null,"url":null,"abstract":"It is not easy to align DNA sequences of whole genomes to find some useful biomarkers (in general term), as whole genome has more than 10 mega bases. Thus, we do not apply the straightforward Smith-Waterman O(N^2) time and space algorithms. If some specific DNA segments in which we are interested are very sparse, then the O(n^2) dynamic programming algorithm for aligning those are inefficient since that algorithm depends on the total length of the input string rather than the number of markers appearing on it. When we consider the retro-element over the whole genome, this problem becomes very clear since the size of whole genome is much more than a few (about 10-20) retro-elements we want to compare. In this paper we propose another alignment problem that consists of only two kinds of symbols, retro-elements and other general symbols. Thus, the whole genome can be regarded as a binary string of '1' (retroelement) and '0' (others). We propose an alignment algorithm for simplified binary string to reveal the structural similarity of retro-elements over two different genomes. The time complexity of this algorithm is O(M^2), where M denotes the number of retro-elements appearing in the genomes. We studied structural similarities of all HERV(Human Endogenous RetroViruses) element of four typical primates including Human, Chimpanzees, Orangutan and Rhesus monkey to show the usefulness of our algorithm. Our system successfully revealed the similarity distribution of retro elements for the four primates.","PeriodicalId":193599,"journal":{"name":"2011 Second International Conference on Intelligent Systems, Modelling and Simulation","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Time and Space Efficient Algorithm for Alignment and Visualization of Sparse Retro-elements over Whole Genome Scale\",\"authors\":\"Sun-young Park, Seon-Yeong Kim, Dong-Sung Ryu, Hwan-Gue Cho\",\"doi\":\"10.1109/ISMS.2011.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is not easy to align DNA sequences of whole genomes to find some useful biomarkers (in general term), as whole genome has more than 10 mega bases. Thus, we do not apply the straightforward Smith-Waterman O(N^2) time and space algorithms. If some specific DNA segments in which we are interested are very sparse, then the O(n^2) dynamic programming algorithm for aligning those are inefficient since that algorithm depends on the total length of the input string rather than the number of markers appearing on it. When we consider the retro-element over the whole genome, this problem becomes very clear since the size of whole genome is much more than a few (about 10-20) retro-elements we want to compare. In this paper we propose another alignment problem that consists of only two kinds of symbols, retro-elements and other general symbols. Thus, the whole genome can be regarded as a binary string of '1' (retroelement) and '0' (others). We propose an alignment algorithm for simplified binary string to reveal the structural similarity of retro-elements over two different genomes. The time complexity of this algorithm is O(M^2), where M denotes the number of retro-elements appearing in the genomes. We studied structural similarities of all HERV(Human Endogenous RetroViruses) element of four typical primates including Human, Chimpanzees, Orangutan and Rhesus monkey to show the usefulness of our algorithm. Our system successfully revealed the similarity distribution of retro elements for the four primates.\",\"PeriodicalId\":193599,\"journal\":{\"name\":\"2011 Second International Conference on Intelligent Systems, Modelling and Simulation\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Second International Conference on Intelligent Systems, Modelling and Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISMS.2011.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Second International Conference on Intelligent Systems, Modelling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISMS.2011.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

由于整个基因组有超过10个兆碱基,因此对整个基因组的DNA序列进行比对以发现一些有用的生物标志物(一般来说)并不容易。因此,我们不应用直接的Smith-Waterman O(N^2)时间和空间算法。如果我们感兴趣的某些特定DNA片段非常稀疏,那么用于对齐这些片段的O(n^2)动态规划算法是低效的,因为该算法依赖于输入字符串的总长度,而不是出现在其中的标记的数量。当我们在整个基因组中考虑逆转录因子时,这个问题变得非常清楚,因为整个基因组的大小远远超过我们想要比较的几个(大约10-20个)逆转录因子。在本文中,我们提出了另一种仅由两种符号组成的对齐问题,即复古元素和其他一般符号。因此,整个基因组可以被看作是“1”(逆转录因子)和“0”(其他)的二进制字符串。我们提出了一种简化二进制字符串的比对算法,以揭示两个不同基因组上逆转录元素的结构相似性。该算法的时间复杂度为O(M^2),其中M表示基因组中出现的逆转录元素的数量。我们研究了人类、黑猩猩、猩猩和恒河猴四种典型灵长类动物的所有HERV(Human Endogenous retrovirus,人类内源性逆转录病毒)元素的结构相似性,以证明我们的算法的有效性。我们的系统成功地揭示了四种灵长类动物的复古元素的相似性分布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Time and Space Efficient Algorithm for Alignment and Visualization of Sparse Retro-elements over Whole Genome Scale
It is not easy to align DNA sequences of whole genomes to find some useful biomarkers (in general term), as whole genome has more than 10 mega bases. Thus, we do not apply the straightforward Smith-Waterman O(N^2) time and space algorithms. If some specific DNA segments in which we are interested are very sparse, then the O(n^2) dynamic programming algorithm for aligning those are inefficient since that algorithm depends on the total length of the input string rather than the number of markers appearing on it. When we consider the retro-element over the whole genome, this problem becomes very clear since the size of whole genome is much more than a few (about 10-20) retro-elements we want to compare. In this paper we propose another alignment problem that consists of only two kinds of symbols, retro-elements and other general symbols. Thus, the whole genome can be regarded as a binary string of '1' (retroelement) and '0' (others). We propose an alignment algorithm for simplified binary string to reveal the structural similarity of retro-elements over two different genomes. The time complexity of this algorithm is O(M^2), where M denotes the number of retro-elements appearing in the genomes. We studied structural similarities of all HERV(Human Endogenous RetroViruses) element of four typical primates including Human, Chimpanzees, Orangutan and Rhesus monkey to show the usefulness of our algorithm. Our system successfully revealed the similarity distribution of retro elements for the four primates.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Information Measure Ratio Based Real Time Approach for Hand Region Segmentation with a Focus on Gesture Recognition Implementation of the Phase II Compiler for the ARM7TDMI-S Dual-Core Microprocessor Design of Hierarchical Thread Pool Executor for DSM Nonlinearity Modelling of QoE for Video Streaming over Wireless and Mobile Network Averaged Segmental Partial Hausdorff Distance for Robust Face Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1