Christian Bergler, Alexander Barnhill, Dominik Perrin, M. Schmitt, A. Maier, E. Nöth
{"title":"ORCA-WHISPER: An Automatic Killer Whale Sound Type Generation Toolkit Using Deep Learning","authors":"Christian Bergler, Alexander Barnhill, Dominik Perrin, M. Schmitt, A. Maier, E. Nöth","doi":"10.21437/interspeech.2022-846","DOIUrl":null,"url":null,"abstract":"Even today, the current understanding and interpretation of animal-specific vocalization paradigms is largely based on his-torical and manual data analysis considering comparatively small data corpora, primarily because of time- and human-resource limitations, next to the scarcity of available species-related machine-learning techniques. Partial human-based data inspections neither represent the overall real-world vocal reper-toire, nor the variations within intra- and inter animal-specific call type portfolios, typically resulting only in small collections of category-specific ground truth data. Modern machine (deep) learning concepts are an essential requirement to identify sta-tistically significant animal-related vocalization patterns within massive bioacoustic data archives. However, the applicability of pure supervised training approaches is challenging, due to limited call-specific ground truth data, combined with strong class-imbalances between individual call type events. The current study is the first presenting a deep bioacoustic signal generation framework, entitled ORCA-WHISPER, a Generative Adversarial Network (GAN), trained on low-resource killer whale ( Orcinus Orca ) call type data. Besides audiovisual in-spection, supervised call type classification, and model transferability, the auspicious quality of generated fake vocalizations was further demonstrated by visualizing, representing, and en-hancing the real-world orca signal data manifold. Moreover, previous orca/noise segmentation results were outperformed by integrating fake signals to the original data partition.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"2413-2417"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Even today, the current understanding and interpretation of animal-specific vocalization paradigms is largely based on his-torical and manual data analysis considering comparatively small data corpora, primarily because of time- and human-resource limitations, next to the scarcity of available species-related machine-learning techniques. Partial human-based data inspections neither represent the overall real-world vocal reper-toire, nor the variations within intra- and inter animal-specific call type portfolios, typically resulting only in small collections of category-specific ground truth data. Modern machine (deep) learning concepts are an essential requirement to identify sta-tistically significant animal-related vocalization patterns within massive bioacoustic data archives. However, the applicability of pure supervised training approaches is challenging, due to limited call-specific ground truth data, combined with strong class-imbalances between individual call type events. The current study is the first presenting a deep bioacoustic signal generation framework, entitled ORCA-WHISPER, a Generative Adversarial Network (GAN), trained on low-resource killer whale ( Orcinus Orca ) call type data. Besides audiovisual in-spection, supervised call type classification, and model transferability, the auspicious quality of generated fake vocalizations was further demonstrated by visualizing, representing, and en-hancing the real-world orca signal data manifold. Moreover, previous orca/noise segmentation results were outperformed by integrating fake signals to the original data partition.