{"title":"基于k-mer的KNN模型的物种注释。","authors":"Srushti Sangar, Prathamesh Kolage, Pritee Chunarkar-Patil","doi":"10.6026/973206300200986","DOIUrl":null,"url":null,"abstract":"<p><p>Bacterial identification is a critical process in microbiology, clinical diagnostics, environmental monitoring, and food safety. Machine learning holds great promise for improving bacterial identification by increasing accuracy, speed, and scalability. However, challenges such as data dependency, model interpretability, and computational demands must be addressed to fully realize it's potential. k-mer based bacterial identification algorithm is an attempt to address these issues. Sequence matching is completed using the KNN technique. This included feature extraction, dataset preparation, classifier training, and label prediction based on k-mer frequency distribution similarity. The algorithm's performance has been cross-checked through accuracy assessment metrics such as F1 score and precision with an impressive 93% accuracy rate.</p>","PeriodicalId":8962,"journal":{"name":"Bioinformation","volume":"20 9","pages":"986-989"},"PeriodicalIF":1.9000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11795478/pdf/","citationCount":"0","resultStr":"{\"title\":\"Species annotation using a k-mer based KNN model.\",\"authors\":\"Srushti Sangar, Prathamesh Kolage, Pritee Chunarkar-Patil\",\"doi\":\"10.6026/973206300200986\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Bacterial identification is a critical process in microbiology, clinical diagnostics, environmental monitoring, and food safety. Machine learning holds great promise for improving bacterial identification by increasing accuracy, speed, and scalability. However, challenges such as data dependency, model interpretability, and computational demands must be addressed to fully realize it's potential. k-mer based bacterial identification algorithm is an attempt to address these issues. Sequence matching is completed using the KNN technique. This included feature extraction, dataset preparation, classifier training, and label prediction based on k-mer frequency distribution similarity. The algorithm's performance has been cross-checked through accuracy assessment metrics such as F1 score and precision with an impressive 93% accuracy rate.</p>\",\"PeriodicalId\":8962,\"journal\":{\"name\":\"Bioinformation\",\"volume\":\"20 9\",\"pages\":\"986-989\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11795478/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.6026/973206300200986\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6026/973206300200986","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
Bacterial identification is a critical process in microbiology, clinical diagnostics, environmental monitoring, and food safety. Machine learning holds great promise for improving bacterial identification by increasing accuracy, speed, and scalability. However, challenges such as data dependency, model interpretability, and computational demands must be addressed to fully realize it's potential. k-mer based bacterial identification algorithm is an attempt to address these issues. Sequence matching is completed using the KNN technique. This included feature extraction, dataset preparation, classifier training, and label prediction based on k-mer frequency distribution similarity. The algorithm's performance has been cross-checked through accuracy assessment metrics such as F1 score and precision with an impressive 93% accuracy rate.