Ensemble deep learning and anomaly detection framework for automatic audio classification: Insights into deer vocalizations

IF 5.8 2区 环境科学与生态学 Q1 ECOLOGY Ecological Informatics Pub Date : 2024-11-08 DOI:10.1016/j.ecoinf.2024.102883
Salem Ibrahim Salem , Sakae Shirayama , Sho Shimazaki , Kazuo Oki
{"title":"Ensemble deep learning and anomaly detection framework for automatic audio classification: Insights into deer vocalizations","authors":"Salem Ibrahim Salem ,&nbsp;Sakae Shirayama ,&nbsp;Sho Shimazaki ,&nbsp;Kazuo Oki","doi":"10.1016/j.ecoinf.2024.102883","DOIUrl":null,"url":null,"abstract":"<div><div>Audio recordings have emerged as a pivotal tool in field observations, enriching environmental monitoring in both the spatial and temporal dimensions. However, the richness and complexity of these recordings pose significant challenges, primarily when extracting specific sound clips from long recordings owing to the presence of ambient noise and other irrelevant sounds. Traditional methods, such as manual extraction or a sliding window over audio segments, hinder practical bioacoustic applications. Therefore, we propose a framework that begins with a robust segmentation method for extracting sound clips that potentially contain deer vocalizations. This segmentation method relies on acoustic anomaly detection and can markedly improve computational efficiency, facilitating deployment in environments with limited resources. Subsequently, the isolated clips were classified into deer and non-deer categories using machine learning models. Our investigation assessed three state-of-the-art deep learning models, ResNet50, MobileNetV2, and EfficientNet-B2, considering various hyperparameter configurations to optimize the performance. We utilized 3842 clips from two sites, Oze National Park and Taki, for training and testing. The outcomes demonstrated that all models exhibited comparable performances, with median accuracies of 98.3 % and 92.9 % during the validation and testing stages, respectively. However, no single model outperformed the others across all the evaluation metrics. For instance, ResNet50 in different configurations led to the best accuracy, F1 score, precision, and specificity, whereas MobileNetV2 had the best recall. Therefore, we adopted a consensus-based ensemble scoring system in which an audio clip was classified as a deer call when at least two of three models concurred in their classification to enhance the reliability of our classifications. Our findings demonstrated that the Ensemble approach significantly enhanced the classification performance, achieving an accuracy of 99.2 % in the test stage. The proposed approach was successfully deployed during the deer rutting seasons in Oze and Taki in 2019 and 2021, respectively. We gained invaluable insights into deer behavior by analyzing deer calls' frequency, timing, and duration. Additionally, the spatial distribution of deer calls in Taki enabled us to detect a breach in the city's protective fencing and an association between the spatial patterns of deer calls and crop damage in the two fields. We aimed to draw a comprehensive picture of deer activity, which has significant implications for both conservation efforts and understanding animal behavior in various habitats. The insights gathered from this research contribute to the scientific understanding of deer behavior and serve as a foundation for future studies and conservation initiatives. By incorporating advanced machine learning models into environmental monitoring, we have paved the way for more data-driven approaches in wildlife research.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"84 ","pages":"Article 102883"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954124004254","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Audio recordings have emerged as a pivotal tool in field observations, enriching environmental monitoring in both the spatial and temporal dimensions. However, the richness and complexity of these recordings pose significant challenges, primarily when extracting specific sound clips from long recordings owing to the presence of ambient noise and other irrelevant sounds. Traditional methods, such as manual extraction or a sliding window over audio segments, hinder practical bioacoustic applications. Therefore, we propose a framework that begins with a robust segmentation method for extracting sound clips that potentially contain deer vocalizations. This segmentation method relies on acoustic anomaly detection and can markedly improve computational efficiency, facilitating deployment in environments with limited resources. Subsequently, the isolated clips were classified into deer and non-deer categories using machine learning models. Our investigation assessed three state-of-the-art deep learning models, ResNet50, MobileNetV2, and EfficientNet-B2, considering various hyperparameter configurations to optimize the performance. We utilized 3842 clips from two sites, Oze National Park and Taki, for training and testing. The outcomes demonstrated that all models exhibited comparable performances, with median accuracies of 98.3 % and 92.9 % during the validation and testing stages, respectively. However, no single model outperformed the others across all the evaluation metrics. For instance, ResNet50 in different configurations led to the best accuracy, F1 score, precision, and specificity, whereas MobileNetV2 had the best recall. Therefore, we adopted a consensus-based ensemble scoring system in which an audio clip was classified as a deer call when at least two of three models concurred in their classification to enhance the reliability of our classifications. Our findings demonstrated that the Ensemble approach significantly enhanced the classification performance, achieving an accuracy of 99.2 % in the test stage. The proposed approach was successfully deployed during the deer rutting seasons in Oze and Taki in 2019 and 2021, respectively. We gained invaluable insights into deer behavior by analyzing deer calls' frequency, timing, and duration. Additionally, the spatial distribution of deer calls in Taki enabled us to detect a breach in the city's protective fencing and an association between the spatial patterns of deer calls and crop damage in the two fields. We aimed to draw a comprehensive picture of deer activity, which has significant implications for both conservation efforts and understanding animal behavior in various habitats. The insights gathered from this research contribute to the scientific understanding of deer behavior and serve as a foundation for future studies and conservation initiatives. By incorporating advanced machine learning models into environmental monitoring, we have paved the way for more data-driven approaches in wildlife research.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于自动音频分类的集合深度学习和异常检测框架:对鹿发声的见解
音频记录已成为实地观测的重要工具,从空间和时间两个维度丰富了环境监测的内容。然而,这些录音的丰富性和复杂性带来了巨大的挑战,主要是在从冗长的录音中提取特定声音片段时,由于环境噪声和其他无关声音的存在。传统的方法,如手动提取或在音频片段上使用滑动窗口,都会阻碍生物声学的实际应用。因此,我们提出了一个框架,首先采用一种稳健的分割方法来提取可能包含鹿发声的声音片段。这种分割方法依赖于声学异常检测,可显著提高计算效率,便于在资源有限的环境中部署。随后,利用机器学习模型将分离出的片段分为鹿和非鹿类。我们的研究评估了三种最先进的深度学习模型:ResNet50、MobileNetV2 和 EfficientNet-B2,并考虑了各种超参数配置以优化性能。我们使用了来自奥泽国家公园和塔基两个地点的 3842 个片段进行训练和测试。结果表明,所有模型的性能相当,在验证和测试阶段的中位准确率分别为 98.3 % 和 92.9 %。但是,没有一个模型在所有评估指标上都优于其他模型。例如,在不同配置下,ResNet50 的准确度、F1 分数、精确度和特异性最好,而 MobileNetV2 的召回率最好。因此,我们采用了基于共识的集合评分系统,即当三个模型中至少有两个模型的分类结果一致时,音频片段就被归类为鹿叫,以提高分类的可靠性。我们的研究结果表明,合奏法显著提高了分类性能,在测试阶段达到了 99.2% 的准确率。建议的方法分别于 2019 年和 2021 年在 Oze 和 Taki 的鹿发情季节成功部署。通过分析鹿叫声的频率、时间和持续时间,我们获得了有关鹿行为的宝贵见解。此外,塔基鹿叫声的空间分布使我们能够发现城市防护栏的破损情况,以及鹿叫声的空间模式与两块田地的作物损害之间的关联。我们的目的是全面了解鹿的活动情况,这对保护工作和了解动物在不同栖息地的行为都有重要意义。从这项研究中收集到的见解有助于从科学角度理解鹿的行为,并为未来的研究和保护措施奠定基础。通过将先进的机器学习模型纳入环境监测,我们为野生动物研究中更多的数据驱动方法铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Ecological Informatics
Ecological Informatics 环境科学-生态学
CiteScore
8.30
自引率
11.80%
发文量
346
审稿时长
46 days
期刊介绍: The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change. The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.
期刊最新文献
Decadal variations in the driving factors of increasing water-use efficiency in China's terrestrial ecosystems from 2000 to 2022 Using a knowledge representation logic to estimate the availability of Imbrasia epimethea (Lepidoptera: Saturniidae), an important edible insect in Subsaharan Africa Analysis of vegetation dynamics from 2001 to 2020 in China's Ganzhou rare earth mining area using time series remote sensing and SHAP-enhanced machine learning Deep learning-enhanced remote sensing-integrated crop modeling for rice yield prediction Socio-economic factors boosting the effectiveness of marine protected areas: A Bayesian network analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1