{"title":"Seq2Image:使用可视化和深度卷积神经网络的序列分析","authors":"Neda Tavakoli","doi":"10.1109/COMPSAC48688.2020.00-71","DOIUrl":null,"url":null,"abstract":"Sequence classification has been widely used in numerous application domains. There exists a good number of classification algorithms that can be applied to feature vectors. However, these classification algorithms cannot be directly applied to the sequence classification problem, mainly because of the difficulties to capture feature vectors from sequences. More specifically, due to the sequential nature of features that exist in a sequence, the clustering problem in sequences suffers from the curse of dimensionality, which makes the sequence classification task more challenging compared to a typical classification on feature vectors. In this paper, we present a novel idea of transforming sequences to images, called Seq2Image, a simple yet effective method to perform genomic sequence classification using Convolutional Neural Network (CNN). We first convert a given genomic sequence to a tensor, and then the obtained tensor is transformed into an image. We then employ the CNN deep learning-based image processing techniques to classify the created images of sequences. The results of our preliminary experimental study are very promising achieving 95.78% training accuracy, 95.76% validation accuracy, and 95.83% testing accuracy for classification of human genome of 166 samples with six different sequence families.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Seq2Image: Sequence Analysis using Visualization and Deep Convolutional Neural Network\",\"authors\":\"Neda Tavakoli\",\"doi\":\"10.1109/COMPSAC48688.2020.00-71\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequence classification has been widely used in numerous application domains. There exists a good number of classification algorithms that can be applied to feature vectors. However, these classification algorithms cannot be directly applied to the sequence classification problem, mainly because of the difficulties to capture feature vectors from sequences. More specifically, due to the sequential nature of features that exist in a sequence, the clustering problem in sequences suffers from the curse of dimensionality, which makes the sequence classification task more challenging compared to a typical classification on feature vectors. In this paper, we present a novel idea of transforming sequences to images, called Seq2Image, a simple yet effective method to perform genomic sequence classification using Convolutional Neural Network (CNN). We first convert a given genomic sequence to a tensor, and then the obtained tensor is transformed into an image. We then employ the CNN deep learning-based image processing techniques to classify the created images of sequences. The results of our preliminary experimental study are very promising achieving 95.78% training accuracy, 95.76% validation accuracy, and 95.83% testing accuracy for classification of human genome of 166 samples with six different sequence families.\",\"PeriodicalId\":430098,\"journal\":{\"name\":\"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSAC48688.2020.00-71\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC48688.2020.00-71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Seq2Image: Sequence Analysis using Visualization and Deep Convolutional Neural Network
Sequence classification has been widely used in numerous application domains. There exists a good number of classification algorithms that can be applied to feature vectors. However, these classification algorithms cannot be directly applied to the sequence classification problem, mainly because of the difficulties to capture feature vectors from sequences. More specifically, due to the sequential nature of features that exist in a sequence, the clustering problem in sequences suffers from the curse of dimensionality, which makes the sequence classification task more challenging compared to a typical classification on feature vectors. In this paper, we present a novel idea of transforming sequences to images, called Seq2Image, a simple yet effective method to perform genomic sequence classification using Convolutional Neural Network (CNN). We first convert a given genomic sequence to a tensor, and then the obtained tensor is transformed into an image. We then employ the CNN deep learning-based image processing techniques to classify the created images of sequences. The results of our preliminary experimental study are very promising achieving 95.78% training accuracy, 95.76% validation accuracy, and 95.83% testing accuracy for classification of human genome of 166 samples with six different sequence families.