Infinite string block matching features for DNA classification

D. Ashlock, Sierra Gillis, W. Ashlock
{"title":"Infinite string block matching features for DNA classification","authors":"D. Ashlock, Sierra Gillis, W. Ashlock","doi":"10.1109/CIBCB.2017.8058529","DOIUrl":null,"url":null,"abstract":"Automatic classification of DNA can be performed in a number of ways using a variety of features. This study introduces a novel technique for generating global features for DNA classification. Based on a new technique, the “do what's possible” representation, infinite string generators are evolved to produce strings with a maximized collection of matching blocks above a critical length in the target DNA. Most global DNA features, such as GC-content or those in spectrum string kernels, capture diffuse statistical information about the target DNA. Infinite string matching is based on multiple loci, and thus finds a different type of global feature than most techniques now in use. It is discovered that the block-matching score for evolved infinite string generators is able to cleanly separate high-entropy synthetic DNA data sets using a single feature threshold classifier. Preliminary evaluation on human endogenous retrovirus sequences shows that evolved infinite string generators locate promising features on biological data as well.","PeriodicalId":283115,"journal":{"name":"2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2017.8058529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Automatic classification of DNA can be performed in a number of ways using a variety of features. This study introduces a novel technique for generating global features for DNA classification. Based on a new technique, the “do what's possible” representation, infinite string generators are evolved to produce strings with a maximized collection of matching blocks above a critical length in the target DNA. Most global DNA features, such as GC-content or those in spectrum string kernels, capture diffuse statistical information about the target DNA. Infinite string matching is based on multiple loci, and thus finds a different type of global feature than most techniques now in use. It is discovered that the block-matching score for evolved infinite string generators is able to cleanly separate high-entropy synthetic DNA data sets using a single feature threshold classifier. Preliminary evaluation on human endogenous retrovirus sequences shows that evolved infinite string generators locate promising features on biological data as well.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DNA分类的无限串块匹配特征
DNA的自动分类可以通过多种方式使用各种特征来执行。本研究介绍了一种用于DNA分类的全局特征生成新技术。基于一种新的技术,即“尽一切可能”的表示,无限字符串生成器被进化为产生具有最大匹配块集合的字符串,超过目标DNA的临界长度。大多数全局DNA特征,如gc含量或谱串核中的特征,捕获了关于目标DNA的弥散统计信息。无限字符串匹配基于多个位点,因此找到了与目前使用的大多数技术不同的全局特征类型。研究发现,进化无限字符串生成器的块匹配分数能够使用单个特征阈值分类器清晰地分离高熵合成DNA数据集。对人类内源性逆转录病毒序列的初步评价表明,进化的无限字符串生成器也能在生物学数据中找到有希望的特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Using Benford's law to detect anomalies in electroencephalogram: An application to detecting alzheimer's disease Microbial abundance analysis and phylogenetic adoption in functional metagenomics Data-driven longitudinal modeling and prediction of symptom dynamics in major depressive disorder: Integrating factor graphs and learning methods Multi-objective evolution of artificial neural networks in multi-class medical diagnosis problems with class imbalance A novel hybrid differential evolution strategy applied to classifier design for mortality prediction in adult critical care admissions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1