Salem Ibrahim Salem , Sakae Shirayama , Sho Shimazaki , Kazuo Oki
{"title":"Ensemble deep learning and anomaly detection framework for automatic audio classification: Insights into deer vocalizations","authors":"Salem Ibrahim Salem , Sakae Shirayama , Sho Shimazaki , Kazuo Oki","doi":"10.1016/j.ecoinf.2024.102883","DOIUrl":null,"url":null,"abstract":"<div><div>Audio recordings have emerged as a pivotal tool in field observations, enriching environmental monitoring in both the spatial and temporal dimensions. However, the richness and complexity of these recordings pose significant challenges, primarily when extracting specific sound clips from long recordings owing to the presence of ambient noise and other irrelevant sounds. Traditional methods, such as manual extraction or a sliding window over audio segments, hinder practical bioacoustic applications. Therefore, we propose a framework that begins with a robust segmentation method for extracting sound clips that potentially contain deer vocalizations. This segmentation method relies on acoustic anomaly detection and can markedly improve computational efficiency, facilitating deployment in environments with limited resources. Subsequently, the isolated clips were classified into deer and non-deer categories using machine learning models. Our investigation assessed three state-of-the-art deep learning models, ResNet50, MobileNetV2, and EfficientNet-B2, considering various hyperparameter configurations to optimize the performance. We utilized 3842 clips from two sites, Oze National Park and Taki, for training and testing. The outcomes demonstrated that all models exhibited comparable performances, with median accuracies of 98.3 % and 92.9 % during the validation and testing stages, respectively. However, no single model outperformed the others across all the evaluation metrics. For instance, ResNet50 in different configurations led to the best accuracy, F1 score, precision, and specificity, whereas MobileNetV2 had the best recall. Therefore, we adopted a consensus-based ensemble scoring system in which an audio clip was classified as a deer call when at least two of three models concurred in their classification to enhance the reliability of our classifications. Our findings demonstrated that the Ensemble approach significantly enhanced the classification performance, achieving an accuracy of 99.2 % in the test stage. The proposed approach was successfully deployed during the deer rutting seasons in Oze and Taki in 2019 and 2021, respectively. We gained invaluable insights into deer behavior by analyzing deer calls' frequency, timing, and duration. Additionally, the spatial distribution of deer calls in Taki enabled us to detect a breach in the city's protective fencing and an association between the spatial patterns of deer calls and crop damage in the two fields. We aimed to draw a comprehensive picture of deer activity, which has significant implications for both conservation efforts and understanding animal behavior in various habitats. The insights gathered from this research contribute to the scientific understanding of deer behavior and serve as a foundation for future studies and conservation initiatives. By incorporating advanced machine learning models into environmental monitoring, we have paved the way for more data-driven approaches in wildlife research.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"84 ","pages":"Article 102883"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954124004254","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Audio recordings have emerged as a pivotal tool in field observations, enriching environmental monitoring in both the spatial and temporal dimensions. However, the richness and complexity of these recordings pose significant challenges, primarily when extracting specific sound clips from long recordings owing to the presence of ambient noise and other irrelevant sounds. Traditional methods, such as manual extraction or a sliding window over audio segments, hinder practical bioacoustic applications. Therefore, we propose a framework that begins with a robust segmentation method for extracting sound clips that potentially contain deer vocalizations. This segmentation method relies on acoustic anomaly detection and can markedly improve computational efficiency, facilitating deployment in environments with limited resources. Subsequently, the isolated clips were classified into deer and non-deer categories using machine learning models. Our investigation assessed three state-of-the-art deep learning models, ResNet50, MobileNetV2, and EfficientNet-B2, considering various hyperparameter configurations to optimize the performance. We utilized 3842 clips from two sites, Oze National Park and Taki, for training and testing. The outcomes demonstrated that all models exhibited comparable performances, with median accuracies of 98.3 % and 92.9 % during the validation and testing stages, respectively. However, no single model outperformed the others across all the evaluation metrics. For instance, ResNet50 in different configurations led to the best accuracy, F1 score, precision, and specificity, whereas MobileNetV2 had the best recall. Therefore, we adopted a consensus-based ensemble scoring system in which an audio clip was classified as a deer call when at least two of three models concurred in their classification to enhance the reliability of our classifications. Our findings demonstrated that the Ensemble approach significantly enhanced the classification performance, achieving an accuracy of 99.2 % in the test stage. The proposed approach was successfully deployed during the deer rutting seasons in Oze and Taki in 2019 and 2021, respectively. We gained invaluable insights into deer behavior by analyzing deer calls' frequency, timing, and duration. Additionally, the spatial distribution of deer calls in Taki enabled us to detect a breach in the city's protective fencing and an association between the spatial patterns of deer calls and crop damage in the two fields. We aimed to draw a comprehensive picture of deer activity, which has significant implications for both conservation efforts and understanding animal behavior in various habitats. The insights gathered from this research contribute to the scientific understanding of deer behavior and serve as a foundation for future studies and conservation initiatives. By incorporating advanced machine learning models into environmental monitoring, we have paved the way for more data-driven approaches in wildlife research.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.