Jong Hyun Shin,Sun Ju Kim,Gwanghun Kim,Hang-Rae Kim,Kwan Soo Ko
{"title":"Machine Learning Using Template-BasedPredicted Structure of Haemagglutinin Predicts Pathogenicity of Avian Influenza.","authors":"Jong Hyun Shin,Sun Ju Kim,Gwanghun Kim,Hang-Rae Kim,Kwan Soo Ko","doi":"10.4014/jmb.2405.05022","DOIUrl":null,"url":null,"abstract":"Deep learning presents a promising approach to complex biological classifications, contingent upon the availability of well-curated datasets. This study addresses the challenge of analyzing threedimensional protein structures by introducing a novel pipeline that utilizes open-source tools to convert protein structures into a format amenable to computational analysis. Applying a twodimensional convolutional neural network (CNN) to a dataset of 12,143 avian influenza virus genomes from 64 countries, encompassing 119 hemagglutinin (HA) and neuraminidase (NA) types, we achieved significant classification accuracy. The pathogenicity was determined based on the presence of H5 or H7 subtypes, and our models, ranging from zero to six mid-layers, indicated that a four-layer model most effectively identified highly pathogenic strains, with accuracies over 0.9. Enhancing our approach, we incorporated Principal Component Analysis (PCA) for dimensionality reduction and one-class SVM for abnormality detection, improving model robustness through bootstrapping. Furthermore, the K-nearest neighbor (K-NN) algorithm was fine-tuned via hyperparameter optimization to corroborate the findings. The PCA identified distinct clustering for pathogenic HA, yielding an AUC of up to 0.85. The optimized K-NN model demonstrated an impressive accuracy between 0.96 and 0.97. These combined methodologies underscore our deep learning framework's capacity for rapid and precise identification of pathogenic avian influenza strains, thus providing a critical tool for managing global avian influenza threats.","PeriodicalId":16481,"journal":{"name":"Journal of microbiology and biotechnology","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of microbiology and biotechnology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.4014/jmb.2405.05022","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning presents a promising approach to complex biological classifications, contingent upon the availability of well-curated datasets. This study addresses the challenge of analyzing threedimensional protein structures by introducing a novel pipeline that utilizes open-source tools to convert protein structures into a format amenable to computational analysis. Applying a twodimensional convolutional neural network (CNN) to a dataset of 12,143 avian influenza virus genomes from 64 countries, encompassing 119 hemagglutinin (HA) and neuraminidase (NA) types, we achieved significant classification accuracy. The pathogenicity was determined based on the presence of H5 or H7 subtypes, and our models, ranging from zero to six mid-layers, indicated that a four-layer model most effectively identified highly pathogenic strains, with accuracies over 0.9. Enhancing our approach, we incorporated Principal Component Analysis (PCA) for dimensionality reduction and one-class SVM for abnormality detection, improving model robustness through bootstrapping. Furthermore, the K-nearest neighbor (K-NN) algorithm was fine-tuned via hyperparameter optimization to corroborate the findings. The PCA identified distinct clustering for pathogenic HA, yielding an AUC of up to 0.85. The optimized K-NN model demonstrated an impressive accuracy between 0.96 and 0.97. These combined methodologies underscore our deep learning framework's capacity for rapid and precise identification of pathogenic avian influenza strains, thus providing a critical tool for managing global avian influenza threats.
期刊介绍:
The Journal of Microbiology and Biotechnology (JMB) is a monthly international journal devoted to the advancement and dissemination of scientific knowledge pertaining to microbiology, biotechnology, and related academic disciplines. It covers various scientific and technological aspects of Molecular and Cellular Microbiology, Environmental Microbiology and Biotechnology, Food Biotechnology, and Biotechnology and Bioengineering (subcategories are listed below). Launched in March 1991, the JMB is published by the Korean Society for Microbiology and Biotechnology (KMB) and distributed worldwide.