{"title":"An Approach to DNA Sequence Classification through Machine Learning","authors":"Sapna Juneja","doi":"10.4018/ijrqeh.299963","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) has been instrumental in optimal decision making through relevant historical data, including the domain of Bioinformatics. In bioinformatics classification of natural genes and the genes that are infected by disease called invalid gene is a very complex task. In order to find the applicability of a Fresh Protein through Genomic research, DNA sequences are needed to be classified. The current work identifies classes of DNA sequence using Machine Learning algorithm. These classes are basically dependent on the sequence of nucleotides. With a fractional mutation in sequence there is a corresponding change in the class. Each numeric instance representing a class is linked to a Gene family including G protein coupled receptors, tyrosine kinase, synthase etc. In this paper, we applied the classification algorithm on three types of datasets to identify which gene class they belongs to. We converted sequences into substrings with a defined length. That ‘k value’ defines the length of substring which is one of the way to analyze the sequence.","PeriodicalId":36298,"journal":{"name":"International Journal of Reliable and Quality E-Healthcare","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Reliable and Quality E-Healthcare","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijrqeh.299963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Nursing","Score":null,"Total":0}
引用次数: 1
Abstract
Machine learning (ML) has been instrumental in optimal decision making through relevant historical data, including the domain of Bioinformatics. In bioinformatics classification of natural genes and the genes that are infected by disease called invalid gene is a very complex task. In order to find the applicability of a Fresh Protein through Genomic research, DNA sequences are needed to be classified. The current work identifies classes of DNA sequence using Machine Learning algorithm. These classes are basically dependent on the sequence of nucleotides. With a fractional mutation in sequence there is a corresponding change in the class. Each numeric instance representing a class is linked to a Gene family including G protein coupled receptors, tyrosine kinase, synthase etc. In this paper, we applied the classification algorithm on three types of datasets to identify which gene class they belongs to. We converted sequences into substrings with a defined length. That ‘k value’ defines the length of substring which is one of the way to analyze the sequence.