{"title":"逆转录病毒识别的快速算法","authors":"W. Ashlock, S. Datta","doi":"10.1109/GENSIPS.2010.5719668","DOIUrl":null,"url":null,"abstract":"Retroviruses have important roles to play in medicine, evolution, and biology. A key step towards understanding the effect of retroviruses on hosts is identifying them in the host genome. Detecting retroviruses using sequence alignment is difficult because are very diverse and have high mutation rates. We propose a fast, accurate algorithm for detecting retroviruses that uses supervised machine learning and three sets of features. One set of novel features identify the characteristic reading frame structure of retroviruses. The other two sets include features that have been used by other researchers for exon finding. Our algorithm distinguishes retroviral genomes from non-coding sequences and endogenous retroviruses from non-coding sequences and from genes with high accuracy. It also distinguishes endogenous retroviruses from intact retroviral genomes, lentiviruses from other retroviruses, all with high accuracy.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Fast algorithms for recognizing retroviruses\",\"authors\":\"W. Ashlock, S. Datta\",\"doi\":\"10.1109/GENSIPS.2010.5719668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Retroviruses have important roles to play in medicine, evolution, and biology. A key step towards understanding the effect of retroviruses on hosts is identifying them in the host genome. Detecting retroviruses using sequence alignment is difficult because are very diverse and have high mutation rates. We propose a fast, accurate algorithm for detecting retroviruses that uses supervised machine learning and three sets of features. One set of novel features identify the characteristic reading frame structure of retroviruses. The other two sets include features that have been used by other researchers for exon finding. Our algorithm distinguishes retroviral genomes from non-coding sequences and endogenous retroviruses from non-coding sequences and from genes with high accuracy. It also distinguishes endogenous retroviruses from intact retroviral genomes, lentiviruses from other retroviruses, all with high accuracy.\",\"PeriodicalId\":388703,\"journal\":{\"name\":\"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GENSIPS.2010.5719668\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSIPS.2010.5719668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Retroviruses have important roles to play in medicine, evolution, and biology. A key step towards understanding the effect of retroviruses on hosts is identifying them in the host genome. Detecting retroviruses using sequence alignment is difficult because are very diverse and have high mutation rates. We propose a fast, accurate algorithm for detecting retroviruses that uses supervised machine learning and three sets of features. One set of novel features identify the characteristic reading frame structure of retroviruses. The other two sets include features that have been used by other researchers for exon finding. Our algorithm distinguishes retroviral genomes from non-coding sequences and endogenous retroviruses from non-coding sequences and from genes with high accuracy. It also distinguishes endogenous retroviruses from intact retroviral genomes, lentiviruses from other retroviruses, all with high accuracy.