Ahmad Qurthobi;Robertas Damaševičius;Vytautas Barzdaitis;Rytis Maskeliūnas
{"title":"Robust Forest Sound Classification Using Pareto-Mordukhovich Optimized MFCC in Environmental Monitoring","authors":"Ahmad Qurthobi;Robertas Damaševičius;Vytautas Barzdaitis;Rytis Maskeliūnas","doi":"10.1109/ACCESS.2025.3535796","DOIUrl":null,"url":null,"abstract":"As a complex ecosystem composed of flora and fauna, the forest has always been vulnerable to threats. Previous researchers utilized environmental audio collections, such as the ESC-50 and UrbanSound8k datasets, as proximate representatives of sounds potentially present in forests. This study focuses on the application of deep learning models for forest sound classification as an effort to establish an early threats detection system. The research evaluates the performance of several pre-trained deep learning models, including MobileNet, GoogleNet, and ResNet, on the limited FSC22 dataset, which consists of 2,025 forest sound recordings classified into 27 categories. To improve classification capabilities, the study introduces a hybrid model that combines neural network (CNN) with a Bidirectional Long-Short-Term Memory (BiLSTM) layer, designed to capture both spatial and temporal features of the sound data. The research also employs Pareto-Mordukhovich-optimized Mel Frequency Cepstral Coefficients (MFCC) for feature extraction, improving the representation of audio signals. Data augmentation and dimensionality reduction techniques were also explored to assess their impact on model performance. The results indicate that the proposed hybrid CNN-BiLSTM model significantly improved classification loss and accuracy scores compared to the standalone pre-trained models. GoogleNet, with an added BiLSTM layer and augmented data, achieved an average reduced loss score of 0.7209 and average accuracy of 0.7852, demonstrating its potential to classify forest sounds. Improvements in loss score and classification performance highlight the potential of hybrid models in environmental sound analysis, particularly in scenarios with limited data availability.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"20923-20944"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10856116","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10856116/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
As a complex ecosystem composed of flora and fauna, the forest has always been vulnerable to threats. Previous researchers utilized environmental audio collections, such as the ESC-50 and UrbanSound8k datasets, as proximate representatives of sounds potentially present in forests. This study focuses on the application of deep learning models for forest sound classification as an effort to establish an early threats detection system. The research evaluates the performance of several pre-trained deep learning models, including MobileNet, GoogleNet, and ResNet, on the limited FSC22 dataset, which consists of 2,025 forest sound recordings classified into 27 categories. To improve classification capabilities, the study introduces a hybrid model that combines neural network (CNN) with a Bidirectional Long-Short-Term Memory (BiLSTM) layer, designed to capture both spatial and temporal features of the sound data. The research also employs Pareto-Mordukhovich-optimized Mel Frequency Cepstral Coefficients (MFCC) for feature extraction, improving the representation of audio signals. Data augmentation and dimensionality reduction techniques were also explored to assess their impact on model performance. The results indicate that the proposed hybrid CNN-BiLSTM model significantly improved classification loss and accuracy scores compared to the standalone pre-trained models. GoogleNet, with an added BiLSTM layer and augmented data, achieved an average reduced loss score of 0.7209 and average accuracy of 0.7852, demonstrating its potential to classify forest sounds. Improvements in loss score and classification performance highlight the potential of hybrid models in environmental sound analysis, particularly in scenarios with limited data availability.
IEEE AccessCOMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍:
IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest.
IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on:
Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals.
Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering.
Development of new or improved fabrication or manufacturing techniques.
Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.