ESPnet-EZ:仅使用 Python 的 ESPnet,易于微调和集成

Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe
{"title":"ESPnet-EZ:仅使用 Python 的 ESPnet,易于微调和集成","authors":"Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe","doi":"arxiv-2409.09506","DOIUrl":null,"url":null,"abstract":"We introduce ESPnet-EZ, an extension of the open-source speech processing\ntoolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ\nfocuses on two major aspects: (i) easy fine-tuning and inference of existing\nESPnet models on various tasks and (ii) easy integration with popular deep\nneural network frameworks such as PyTorch-Lightning, Hugging Face transformers\nand datasets, and Lhotse. By replacing ESPnet design choices inherited from\nKaldi with a Python-only, Bash-free interface, we dramatically reduce the\neffort required to build, debug, and use a new model. For example, to fine-tune\na speech foundation model, ESPnet-EZ, compared to ESPnet, reduces the number of\nnewly written code by 2.7x and the amount of dependent code by 6.7x while\ndramatically reducing the Bash script dependencies. The codebase of ESPnet-EZ\nis publicly available.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration\",\"authors\":\"Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe\",\"doi\":\"arxiv-2409.09506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce ESPnet-EZ, an extension of the open-source speech processing\\ntoolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ\\nfocuses on two major aspects: (i) easy fine-tuning and inference of existing\\nESPnet models on various tasks and (ii) easy integration with popular deep\\nneural network frameworks such as PyTorch-Lightning, Hugging Face transformers\\nand datasets, and Lhotse. By replacing ESPnet design choices inherited from\\nKaldi with a Python-only, Bash-free interface, we dramatically reduce the\\neffort required to build, debug, and use a new model. For example, to fine-tune\\na speech foundation model, ESPnet-EZ, compared to ESPnet, reduces the number of\\nnewly written code by 2.7x and the amount of dependent code by 6.7x while\\ndramatically reducing the Bash script dependencies. The codebase of ESPnet-EZ\\nis publicly available.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09506\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们介绍的 ESPnet-EZ 是开源语音处理工具包 ESPnet 的扩展,旨在快速、轻松地开发语音模型。ESPnet-EZ 主要关注两个方面:(i) 在各种任务中轻松微调和推断现有的 ESPnet 模型;(ii) 轻松集成流行的深度神经网络框架,如 PyTorch-Lightning、Hugging Face transformersand datasets 和 Lhotse。通过用纯 Python、无 Bash 界面取代从 Kaldi 继承而来的 ESPnet 设计选择,我们大大减少了构建、调试和使用新模型所需的工作量。例如,在微调语音基础模型时,ESPnet-EZ 与 ESPnet 相比,新编写代码的数量减少了 2.7 倍,依赖代码的数量减少了 6.7 倍,同时大大减少了对 Bash 脚本的依赖。ESPnet-EZ的代码库已经公开。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on two major aspects: (i) easy fine-tuning and inference of existing ESPnet models on various tasks and (ii) easy integration with popular deep neural network frameworks such as PyTorch-Lightning, Hugging Face transformers and datasets, and Lhotse. By replacing ESPnet design choices inherited from Kaldi with a Python-only, Bash-free interface, we dramatically reduce the effort required to build, debug, and use a new model. For example, to fine-tune a speech foundation model, ESPnet-EZ, compared to ESPnet, reduces the number of newly written code by 2.7x and the amount of dependent code by 6.7x while dramatically reducing the Bash script dependencies. The codebase of ESPnet-EZ is publicly available.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration Prevailing Research Areas for Music AI in the Era of Foundation Models Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1