Seq2Image: Sequence Analysis using Visualization and Deep Convolutional Neural Network

Neda Tavakoli
{"title":"Seq2Image: Sequence Analysis using Visualization and Deep Convolutional Neural Network","authors":"Neda Tavakoli","doi":"10.1109/COMPSAC48688.2020.00-71","DOIUrl":null,"url":null,"abstract":"Sequence classification has been widely used in numerous application domains. There exists a good number of classification algorithms that can be applied to feature vectors. However, these classification algorithms cannot be directly applied to the sequence classification problem, mainly because of the difficulties to capture feature vectors from sequences. More specifically, due to the sequential nature of features that exist in a sequence, the clustering problem in sequences suffers from the curse of dimensionality, which makes the sequence classification task more challenging compared to a typical classification on feature vectors. In this paper, we present a novel idea of transforming sequences to images, called Seq2Image, a simple yet effective method to perform genomic sequence classification using Convolutional Neural Network (CNN). We first convert a given genomic sequence to a tensor, and then the obtained tensor is transformed into an image. We then employ the CNN deep learning-based image processing techniques to classify the created images of sequences. The results of our preliminary experimental study are very promising achieving 95.78% training accuracy, 95.76% validation accuracy, and 95.83% testing accuracy for classification of human genome of 166 samples with six different sequence families.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC48688.2020.00-71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Sequence classification has been widely used in numerous application domains. There exists a good number of classification algorithms that can be applied to feature vectors. However, these classification algorithms cannot be directly applied to the sequence classification problem, mainly because of the difficulties to capture feature vectors from sequences. More specifically, due to the sequential nature of features that exist in a sequence, the clustering problem in sequences suffers from the curse of dimensionality, which makes the sequence classification task more challenging compared to a typical classification on feature vectors. In this paper, we present a novel idea of transforming sequences to images, called Seq2Image, a simple yet effective method to perform genomic sequence classification using Convolutional Neural Network (CNN). We first convert a given genomic sequence to a tensor, and then the obtained tensor is transformed into an image. We then employ the CNN deep learning-based image processing techniques to classify the created images of sequences. The results of our preliminary experimental study are very promising achieving 95.78% training accuracy, 95.76% validation accuracy, and 95.83% testing accuracy for classification of human genome of 166 samples with six different sequence families.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Seq2Image:使用可视化和深度卷积神经网络的序列分析
序列分类在许多应用领域得到了广泛的应用。存在许多可以应用于特征向量的分类算法。然而,这些分类算法不能直接应用于序列分类问题,主要原因是难以从序列中捕获特征向量。更具体地说,由于序列中存在的特征的顺序性,序列中的聚类问题受到维数诅咒的影响,这使得序列分类任务比典型的基于特征向量的分类更具挑战性。在本文中,我们提出了一种将序列转换为图像的新想法,称为Seq2Image,这是一种使用卷积神经网络(CNN)进行基因组序列分类的简单而有效的方法。首先将给定的基因组序列转换为张量,然后将得到的张量转换为图像。然后,我们使用基于CNN深度学习的图像处理技术对创建的序列图像进行分类。我们的初步实验研究结果非常有希望,对6个不同序列家族的166个样本进行人类基因组分类,训练准确率为95.78%,验证准确率为95.76%,测试准确率为95.83%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The European Concept of Smart City: A Taxonomic Analysis An Early Warning System for Hemodialysis Complications Utilizing Transfer Learning from HD IoT Dataset A Systematic Literature Review of Practical Virtual and Augmented Reality Solutions in Surgery Optimization of Parallel Applications Under CPU Overcommitment A Blockchain Token Economy Model for Financing a Decentralized Electric Vehicle Charging Platform
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1