Self-supervised Representation Learning for Speech Processing

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts Pub Date : 1900-01-01 DOI:10.18653/v1/2022.naacl-tutorials.2

Hung-yi Lee, Abdel-rahman Mohamed, Shinji Watanabe, Tara N. Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, K. Kirchhoff

{"title":"Self-supervised Representation Learning for Speech Processing","authors":"Hung-yi Lee, Abdel-rahman Mohamed, Shinji Watanabe, Tara N. Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, K. Kirchhoff","doi":"10.18653/v1/2022.naacl-tutorials.2","DOIUrl":null,"url":null,"abstract":"There is a trend in the machine learning community to adopt self-supervised approaches to pre-train deep networks. Self-supervised representation learning (SSL) utilizes proxy supervised learning tasks, for example, distinguishing parts of the input signal from distractors, or generating masked input segments conditioned on the unmasked ones, to obtain training data from unlabeled corpora. BERT and GPT in NLP and SimCLR and BYOL in CV are famous examples in this direction. These approaches make it possible to use a tremendous amount of unlabeled data available on the web to train large networks and solve complicated tasks. Thus, SSL has the potential to scale up current machine learning technologies, especially for low-resourced, under-represented use cases, and democratize the technologies. Recently self-supervised approaches for speech processing are also gaining popularity. There are several workshops in relevant topics hosted at ICML 2020 (https://icml-sas.gitlab.io/), NeurIPS 2020 (https://neurips-sas-2020.github.io/), and AAAI 2022 (https://aaai-sas-2022.github.io/). However, there is no previous tutorial about a similar topic based on the authors’ best knowledge. Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing. The proposed tutorial is highly relevant to the special theme of ACL about language diversity. One of the main focuses of the tutorial is leveraging SSL to reduce the dependence of speech technologies on labeled data, and to scale up the technologies especially for under-represented languages and use cases.","PeriodicalId":408563,"journal":{"name":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.naacl-tutorials.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

There is a trend in the machine learning community to adopt self-supervised approaches to pre-train deep networks. Self-supervised representation learning (SSL) utilizes proxy supervised learning tasks, for example, distinguishing parts of the input signal from distractors, or generating masked input segments conditioned on the unmasked ones, to obtain training data from unlabeled corpora. BERT and GPT in NLP and SimCLR and BYOL in CV are famous examples in this direction. These approaches make it possible to use a tremendous amount of unlabeled data available on the web to train large networks and solve complicated tasks. Thus, SSL has the potential to scale up current machine learning technologies, especially for low-resourced, under-represented use cases, and democratize the technologies. Recently self-supervised approaches for speech processing are also gaining popularity. There are several workshops in relevant topics hosted at ICML 2020 (https://icml-sas.gitlab.io/), NeurIPS 2020 (https://neurips-sas-2020.github.io/), and AAAI 2022 (https://aaai-sas-2022.github.io/). However, there is no previous tutorial about a similar topic based on the authors’ best knowledge. Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing. The proposed tutorial is highly relevant to the special theme of ACL about language diversity. One of the main focuses of the tutorial is leveraging SSL to reduce the dependence of speech technologies on labeled data, and to scale up the technologies especially for under-represented languages and use cases.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

语音处理的自监督表示学习

在机器学习社区中，有一种趋势是采用自监督方法来预训练深度网络。自监督表示学习(Self-supervised representation learning, SSL)利用代理监督学习任务，例如，从干扰因素中区分输入信号的部分，或者在未屏蔽的输入信号的基础上生成屏蔽输入段，从未标记的语料库中获得训练数据。NLP中的BERT和GPT, CV中的SimCLR和BYOL就是这方面的著名例子。这些方法使得利用网络上大量的未标记数据来训练大型网络和解决复杂的任务成为可能。因此，SSL具有扩展当前机器学习技术的潜力，特别是对于资源不足、代表性不足的用例，并使技术民主化。最近，语音处理的自我监督方法也越来越受欢迎。在ICML 2020 (https://icml-sas.gitlab.io/)、NeurIPS 2020 (https://neurips-sas-2020.github.io/)和AAAI 2022 (https://aaai-sas-2022.github.io/)上举办了一些相关主题的研讨会。然而，以前没有关于基于作者最佳知识的类似主题的教程。由于SSL的日益普及，以及将语音和语言技术以更好的质量带到更多用例中，并为代表性不足的语言扩展技术的共同使命，我们建议本教程系统地调查语音处理中最新的SSL技术、工具、数据集和性能成就。本教程与ACL关于语言多样性的特殊主题高度相关。本教程的主要重点之一是利用SSL来减少语音技术对标记数据的依赖，并扩展技术，特别是针对代表性不足的语言和用例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts

自引率

0.00%

发文量

期刊最新文献

Human-Centered Evaluation of Explanations Self-supervised Representation Learning for Speech Processing Tutorial on Multimodal Machine Learning New Frontiers of Information Extraction Contrastive Data and Learning for Natural Language Processing