Seq2Image: Sequence Analysis using Visualization and Deep Convolutional Neural Network

2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC) Pub Date : 2020-07-01 DOI:10.1109/COMPSAC48688.2020.00-71

Neda Tavakoli

{"title":"Seq2Image: Sequence Analysis using Visualization and Deep Convolutional Neural Network","authors":"Neda Tavakoli","doi":"10.1109/COMPSAC48688.2020.00-71","DOIUrl":null,"url":null,"abstract":"Sequence classification has been widely used in numerous application domains. There exists a good number of classification algorithms that can be applied to feature vectors. However, these classification algorithms cannot be directly applied to the sequence classification problem, mainly because of the difficulties to capture feature vectors from sequences. More specifically, due to the sequential nature of features that exist in a sequence, the clustering problem in sequences suffers from the curse of dimensionality, which makes the sequence classification task more challenging compared to a typical classification on feature vectors. In this paper, we present a novel idea of transforming sequences to images, called Seq2Image, a simple yet effective method to perform genomic sequence classification using Convolutional Neural Network (CNN). We first convert a given genomic sequence to a tensor, and then the obtained tensor is transformed into an image. We then employ the CNN deep learning-based image processing techniques to classify the created images of sequences. The results of our preliminary experimental study are very promising achieving 95.78% training accuracy, 95.76% validation accuracy, and 95.83% testing accuracy for classification of human genome of 166 samples with six different sequence families.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC48688.2020.00-71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Sequence classification has been widely used in numerous application domains. There exists a good number of classification algorithms that can be applied to feature vectors. However, these classification algorithms cannot be directly applied to the sequence classification problem, mainly because of the difficulties to capture feature vectors from sequences. More specifically, due to the sequential nature of features that exist in a sequence, the clustering problem in sequences suffers from the curse of dimensionality, which makes the sequence classification task more challenging compared to a typical classification on feature vectors. In this paper, we present a novel idea of transforming sequences to images, called Seq2Image, a simple yet effective method to perform genomic sequence classification using Convolutional Neural Network (CNN). We first convert a given genomic sequence to a tensor, and then the obtained tensor is transformed into an image. We then employ the CNN deep learning-based image processing techniques to classify the created images of sequences. The results of our preliminary experimental study are very promising achieving 95.78% training accuracy, 95.76% validation accuracy, and 95.83% testing accuracy for classification of human genome of 166 samples with six different sequence families.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Seq2Image:使用可视化和深度卷积神经网络的序列分析

序列分类在许多应用领域得到了广泛的应用。存在许多可以应用于特征向量的分类算法。然而，这些分类算法不能直接应用于序列分类问题，主要原因是难以从序列中捕获特征向量。更具体地说，由于序列中存在的特征的顺序性，序列中的聚类问题受到维数诅咒的影响，这使得序列分类任务比典型的基于特征向量的分类更具挑战性。在本文中，我们提出了一种将序列转换为图像的新想法，称为Seq2Image，这是一种使用卷积神经网络(CNN)进行基因组序列分类的简单而有效的方法。首先将给定的基因组序列转换为张量，然后将得到的张量转换为图像。然后，我们使用基于CNN深度学习的图像处理技术对创建的序列图像进行分类。我们的初步实验研究结果非常有希望，对6个不同序列家族的166个样本进行人类基因组分类，训练准确率为95.78%，验证准确率为95.76%，测试准确率为95.83%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)

自引率

0.00%

发文量