Self-supervised Representation Learning for Speech Processing

Hung-yi Lee, Abdel-rahman Mohamed, Shinji Watanabe, Tara N. Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, K. Kirchhoff
{"title":"Self-supervised Representation Learning for Speech Processing","authors":"Hung-yi Lee, Abdel-rahman Mohamed, Shinji Watanabe, Tara N. Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, K. Kirchhoff","doi":"10.18653/v1/2022.naacl-tutorials.2","DOIUrl":null,"url":null,"abstract":"There is a trend in the machine learning community to adopt self-supervised approaches to pre-train deep networks. Self-supervised representation learning (SSL) utilizes proxy supervised learning tasks, for example, distinguishing parts of the input signal from distractors, or generating masked input segments conditioned on the unmasked ones, to obtain training data from unlabeled corpora. BERT and GPT in NLP and SimCLR and BYOL in CV are famous examples in this direction. These approaches make it possible to use a tremendous amount of unlabeled data available on the web to train large networks and solve complicated tasks. Thus, SSL has the potential to scale up current machine learning technologies, especially for low-resourced, under-represented use cases, and democratize the technologies. Recently self-supervised approaches for speech processing are also gaining popularity. There are several workshops in relevant topics hosted at ICML 2020 (https://icml-sas.gitlab.io/), NeurIPS 2020 (https://neurips-sas-2020.github.io/), and AAAI 2022 (https://aaai-sas-2022.github.io/). However, there is no previous tutorial about a similar topic based on the authors’ best knowledge. Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing. The proposed tutorial is highly relevant to the special theme of ACL about language diversity. One of the main focuses of the tutorial is leveraging SSL to reduce the dependence of speech technologies on labeled data, and to scale up the technologies especially for under-represented languages and use cases.","PeriodicalId":408563,"journal":{"name":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.naacl-tutorials.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

There is a trend in the machine learning community to adopt self-supervised approaches to pre-train deep networks. Self-supervised representation learning (SSL) utilizes proxy supervised learning tasks, for example, distinguishing parts of the input signal from distractors, or generating masked input segments conditioned on the unmasked ones, to obtain training data from unlabeled corpora. BERT and GPT in NLP and SimCLR and BYOL in CV are famous examples in this direction. These approaches make it possible to use a tremendous amount of unlabeled data available on the web to train large networks and solve complicated tasks. Thus, SSL has the potential to scale up current machine learning technologies, especially for low-resourced, under-represented use cases, and democratize the technologies. Recently self-supervised approaches for speech processing are also gaining popularity. There are several workshops in relevant topics hosted at ICML 2020 (https://icml-sas.gitlab.io/), NeurIPS 2020 (https://neurips-sas-2020.github.io/), and AAAI 2022 (https://aaai-sas-2022.github.io/). However, there is no previous tutorial about a similar topic based on the authors’ best knowledge. Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing. The proposed tutorial is highly relevant to the special theme of ACL about language diversity. One of the main focuses of the tutorial is leveraging SSL to reduce the dependence of speech technologies on labeled data, and to scale up the technologies especially for under-represented languages and use cases.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语音处理的自监督表示学习
在机器学习社区中,有一种趋势是采用自监督方法来预训练深度网络。自监督表示学习(Self-supervised representation learning, SSL)利用代理监督学习任务,例如,从干扰因素中区分输入信号的部分,或者在未屏蔽的输入信号的基础上生成屏蔽输入段,从未标记的语料库中获得训练数据。NLP中的BERT和GPT, CV中的SimCLR和BYOL就是这方面的著名例子。这些方法使得利用网络上大量的未标记数据来训练大型网络和解决复杂的任务成为可能。因此,SSL具有扩展当前机器学习技术的潜力,特别是对于资源不足、代表性不足的用例,并使技术民主化。最近,语音处理的自我监督方法也越来越受欢迎。在ICML 2020 (https://icml-sas.gitlab.io/)、NeurIPS 2020 (https://neurips-sas-2020.github.io/)和AAAI 2022 (https://aaai-sas-2022.github.io/)上举办了一些相关主题的研讨会。然而,以前没有关于基于作者最佳知识的类似主题的教程。由于SSL的日益普及,以及将语音和语言技术以更好的质量带到更多用例中,并为代表性不足的语言扩展技术的共同使命,我们建议本教程系统地调查语音处理中最新的SSL技术、工具、数据集和性能成就。本教程与ACL关于语言多样性的特殊主题高度相关。本教程的主要重点之一是利用SSL来减少语音技术对标记数据的依赖,并扩展技术,特别是针对代表性不足的语言和用例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Human-Centered Evaluation of Explanations Self-supervised Representation Learning for Speech Processing Tutorial on Multimodal Machine Learning New Frontiers of Information Extraction Contrastive Data and Learning for Natural Language Processing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1