从平面嵌入中交互式提取不同的发声单元，无需事先进行声音分割。

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Frontiers in bioinformatics Pub Date : 2023-01-13 eCollection Date: 2022-01-01 DOI:10.3389/fbinf.2022.966066

Corinna Lorenz, Xinyu Hao, Tomas Tomka, Linus Rüttimann, Richard H R Hahnloser

{"title":"从平面嵌入中交互式提取不同的发声单元，无需事先进行声音分割。","authors":"Corinna Lorenz, Xinyu Hao, Tomas Tomka, Linus Rüttimann, Richard H R Hahnloser","doi":"10.3389/fbinf.2022.966066","DOIUrl":null,"url":null,"abstract":"Annotating and proofreading data sets of complex natural behaviors such as vocalizations are tedious tasks because instances of a given behavior need to be correctly segmented from background noise and must be classified with minimal false positive error rate. Low-dimensional embeddings have proven very useful for this task because they can provide a visual overview of a data set in which distinct behaviors appear in different clusters. However, low-dimensional embeddings introduce errors because they fail to preserve distances; and embeddings represent only objects of fixed dimensionality, which conflicts with vocalizations that have variable dimensions stemming from their variable durations. To mitigate these issues, we introduce a semi-supervised, analytical method for simultaneous segmentation and clustering of vocalizations. We define a given vocalization type by specifying pairs of high-density regions in the embedding plane of sound spectrograms, one region associated with vocalization onsets and the other with offsets. We demonstrate our two-neighborhood (2N) extraction method on the task of clustering adult zebra finch vocalizations embedded with UMAP. We show that 2N extraction allows the identification of short and long vocal renditions from continuous data streams without initially committing to a particular segmentation of the data. Also, 2N extraction achieves much lower false positive error rate than comparable approaches based on a single defining region. Along with our method, we present a graphical user interface (GUI) for visualizing and annotating data.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9880044/pdf/","citationCount":"0","resultStr":"{\"title\":\"Interactive extraction of diverse vocal units from a planar embedding without the need for prior sound segmentation.\",\"authors\":\"Corinna Lorenz, Xinyu Hao, Tomas Tomka, Linus Rüttimann, Richard H R Hahnloser\",\"doi\":\"10.3389/fbinf.2022.966066\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Annotating and proofreading data sets of complex natural behaviors such as vocalizations are tedious tasks because instances of a given behavior need to be correctly segmented from background noise and must be classified with minimal false positive error rate. Low-dimensional embeddings have proven very useful for this task because they can provide a visual overview of a data set in which distinct behaviors appear in different clusters. However, low-dimensional embeddings introduce errors because they fail to preserve distances; and embeddings represent only objects of fixed dimensionality, which conflicts with vocalizations that have variable dimensions stemming from their variable durations. To mitigate these issues, we introduce a semi-supervised, analytical method for simultaneous segmentation and clustering of vocalizations. We define a given vocalization type by specifying pairs of high-density regions in the embedding plane of sound spectrograms, one region associated with vocalization onsets and the other with offsets. We demonstrate our two-neighborhood (2N) extraction method on the task of clustering adult zebra finch vocalizations embedded with UMAP. We show that 2N extraction allows the identification of short and long vocal renditions from continuous data streams without initially committing to a particular segmentation of the data. Also, 2N extraction achieves much lower false positive error rate than comparable approaches based on a single defining region. Along with our method, we present a graphical user interface (GUI) for visualizing and annotating data.\",\"PeriodicalId\":73066,\"journal\":{\"name\":\"Frontiers in bioinformatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2023-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9880044/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fbinf.2022.966066\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2022.966066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

注释和校对复杂自然行为（如发声）的数据集是一项繁琐的任务，因为需要从背景噪声中正确分割出特定行为的实例，并且必须以最小的误判率进行分类。事实证明，低维嵌入对这项任务非常有用，因为它们可以提供数据集的可视化概览，其中不同的行为出现在不同的聚类中。然而，低维嵌入会带来误差，因为它们无法保留距离；而且嵌入只表示固定维度的对象，这与发声相冲突，因为发声的持续时间不同，维度也不同。为了缓解这些问题，我们引入了一种半监督的分析方法，可同时对发声进行分割和聚类。我们通过在声音频谱图的嵌入平面上指定成对的高密度区域来定义给定的发声类型，其中一个区域与发声开始相关，另一个区域与发声结束相关。我们在对嵌入 UMAP 的成年斑马雀发声进行聚类的任务中演示了我们的双邻域（2N）提取方法。结果表明，2N 提取法可以从连续数据流中识别长短发声，而无需对数据进行特定的分割。此外，与基于单一定义区域的同类方法相比，2N 提取的误报率要低得多。除了我们的方法，我们还提供了一个图形用户界面（GUI），用于可视化和注释数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Interactive extraction of diverse vocal units from a planar embedding without the need for prior sound segmentation.

Annotating and proofreading data sets of complex natural behaviors such as vocalizations are tedious tasks because instances of a given behavior need to be correctly segmented from background noise and must be classified with minimal false positive error rate. Low-dimensional embeddings have proven very useful for this task because they can provide a visual overview of a data set in which distinct behaviors appear in different clusters. However, low-dimensional embeddings introduce errors because they fail to preserve distances; and embeddings represent only objects of fixed dimensionality, which conflicts with vocalizations that have variable dimensions stemming from their variable durations. To mitigate these issues, we introduce a semi-supervised, analytical method for simultaneous segmentation and clustering of vocalizations. We define a given vocalization type by specifying pairs of high-density regions in the embedding plane of sound spectrograms, one region associated with vocalization onsets and the other with offsets. We demonstrate our two-neighborhood (2N) extraction method on the task of clustering adult zebra finch vocalizations embedded with UMAP. We show that 2N extraction allows the identification of short and long vocal renditions from continuous data streams without initially committing to a particular segmentation of the data. Also, 2N extraction achieves much lower false positive error rate than comparable approaches based on a single defining region. Along with our method, we present a graphical user interface (GUI) for visualizing and annotating data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助