Unsupervised classification of graded animal vocalisations using fuzzy clustering

bioRxiv - Animal Behavior and Cognition Pub Date : 2024-09-15 DOI:10.1101/2024.09.13.612808

Benjamin Benti, Patrick Jo Miller, Heike Vester, Florencia Noriega, Charlotte Cure

{"title":"Unsupervised classification of graded animal vocalisations using fuzzy clustering","authors":"Benjamin Benti, Patrick Jo Miller, Heike Vester, Florencia Noriega, Charlotte Cure","doi":"10.1101/2024.09.13.612808","DOIUrl":null,"url":null,"abstract":"We present here an unsupervised procedure for the classification of graded animal vocalisations based on Mel frequency cepstral coefficients and fuzzy clustering. Cepstral coefficients compress information about the distribution of energy across the frequency spectrum into a reduced number of variables and are well-defined for signals of various acoustic characteristics (tonal, pulsed, or broadband). In addition, the Mel scale mimics the logarithmic perception of pitch by mammalian ears and is therefore well-suited to defined meaningful perceptual categories for mammals. Fuzzy clustering is a soft classification\napproach. It does not assign samples to a single category, but rather describes their position relative to overlapping categories. This method is capable of identifying stereotyped vocalisations - vocalisations located in a single category - and graded vocalisations - vocalisation which lie between categories - in a quantitative way. We evaluated the performance of this procedure on a set of long-finned pilot whale (Globicephala melas) calls. We compared our results with a call catalogue previously defined through audio-visual inspection of the calls by human experts. Our unsupervised classification achieved slightly lower precision than the catalogue approach: we described between two and ten fuzzy clusters compared to 11 call types in the catalogue. The fuzzy clustering did not replicate the manual classification. One-to-one correspondence between fuzzy clusters and catalogue call types were rare, however the same sets of call types were consistently grouped together within fuzzy clusters. There were also discrepancies between both classification approaches, with some catalogue call types being consistently spread over several fuzzy clusters. Compared to manual classification, the fuzzy clustering approach proved to be much less time-consuming (days vs. months) and provided additional quantitative information about the graded nature of the vocalisations. We discuss the scope of our unsupervised classifier and the need to investigate the functions of call gradation in future research.","PeriodicalId":501210,"journal":{"name":"bioRxiv - Animal Behavior and Cognition","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Animal Behavior and Cognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.13.612808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We present here an unsupervised procedure for the classification of graded animal vocalisations based on Mel frequency cepstral coefficients and fuzzy clustering. Cepstral coefficients compress information about the distribution of energy across the frequency spectrum into a reduced number of variables and are well-defined for signals of various acoustic characteristics (tonal, pulsed, or broadband). In addition, the Mel scale mimics the logarithmic perception of pitch by mammalian ears and is therefore well-suited to defined meaningful perceptual categories for mammals. Fuzzy clustering is a soft classification approach. It does not assign samples to a single category, but rather describes their position relative to overlapping categories. This method is capable of identifying stereotyped vocalisations - vocalisations located in a single category - and graded vocalisations - vocalisation which lie between categories - in a quantitative way. We evaluated the performance of this procedure on a set of long-finned pilot whale (Globicephala melas) calls. We compared our results with a call catalogue previously defined through audio-visual inspection of the calls by human experts. Our unsupervised classification achieved slightly lower precision than the catalogue approach: we described between two and ten fuzzy clusters compared to 11 call types in the catalogue. The fuzzy clustering did not replicate the manual classification. One-to-one correspondence between fuzzy clusters and catalogue call types were rare, however the same sets of call types were consistently grouped together within fuzzy clusters. There were also discrepancies between both classification approaches, with some catalogue call types being consistently spread over several fuzzy clusters. Compared to manual classification, the fuzzy clustering approach proved to be much less time-consuming (days vs. months) and provided additional quantitative information about the graded nature of the vocalisations. We discuss the scope of our unsupervised classifier and the need to investigate the functions of call gradation in future research.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用模糊聚类对分级动物发声进行无监督分类

我们在此介绍一种基于梅尔频率倒频谱系数和模糊聚类的无监督程序，用于对分级动物发声进行分类。倒频谱系数将能量在频谱上的分布信息压缩为数量较少的变量，并能很好地定义各种声学特征（音调、脉冲或宽带）的信号。此外，梅尔音阶模拟了哺乳动物耳朵对音高的对数感知，因此非常适合哺乳动物定义有意义的感知类别。模糊聚类是一种软分类方法。它不会将样本归入单一类别，而是描述样本相对于重叠类别的位置。这种方法能够以定量的方式识别定型发声（位于单一类别的发声）和分级发声（位于类别之间的发声）。我们对一组长鳍领航鲸（Globicephala melas）的叫声进行了评估。我们将我们的结果与人类专家之前通过对叫声进行视听检查而确定的叫声目录进行了比较。我们的无监督分类精度略低于目录方法：我们描述了 2 到 10 个模糊聚类，而目录中有 11 种叫声类型。模糊聚类没有复制人工分类。模糊聚类与目录中的叫声类型之间很少有一一对应的关系，但在模糊聚类中，相同的叫声类型总是被组合在一起。两种分类方法之间也存在差异，一些目录呼叫类型始终分散在多个模糊聚类中。事实证明，与人工分类相比，模糊聚类方法耗时更短（数天对数月），而且还能提供更多有关发声分级的定量信息。我们讨论了我们的无监督分类器的范围，以及在未来研究中调查叫声分级功能的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

bioRxiv - Animal Behavior and Cognition

自引率

0.00%

发文量