Towards End-2-end Learning for Predicting Behavior Codes from Spoken Utterances in Psychotherapy Conversations.

Proceedings of the conference. Association for Computational Linguistics. Meeting Pub Date : 2020-07-01 DOI:10.18653/v1/2020.acl-main.351

Karan Singla, Zhuohao Chen, David C Atkins, Shrikanth Narayanan

{"title":"Towards End-2-end Learning for Predicting Behavior Codes from Spoken Utterances in Psychotherapy Conversations.","authors":"Karan Singla, Zhuohao Chen, David C Atkins, Shrikanth Narayanan","doi":"10.18653/v1/2020.acl-main.351","DOIUrl":null,"url":null,"abstract":"<p><p>Spoken language understanding tasks usually rely on pipelines involving complex processing blocks such as voice activity detection, speaker diarization and Automatic speech recognition (ASR). We propose a novel framework for predicting utterance level labels directly from speech features, thus removing the dependency on first generating transcripts, and transcription free behavioral coding. Our classifier uses a pretrained Speech-2-Vector encoder as bottleneck to generate word-level representations from speech features. This pre-trained encoder learns to encode speech features for a word using an objective similar to Word2Vec. Our proposed approach just uses speech features and word segmentation information for predicting spoken utterance-level target labels. We show that our model achieves competitive results to other state-of-the-art approaches which use transcribed text for the task of predicting psychotherapy-relevant behavior codes.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2020 ","pages":"3797-3803"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9901279/pdf/nihms-1858361.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the conference. Association for Computational Linguistics. Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.acl-main.351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Spoken language understanding tasks usually rely on pipelines involving complex processing blocks such as voice activity detection, speaker diarization and Automatic speech recognition (ASR). We propose a novel framework for predicting utterance level labels directly from speech features, thus removing the dependency on first generating transcripts, and transcription free behavioral coding. Our classifier uses a pretrained Speech-2-Vector encoder as bottleneck to generate word-level representations from speech features. This pre-trained encoder learns to encode speech features for a word using an objective similar to Word2Vec. Our proposed approach just uses speech features and word segmentation information for predicting spoken utterance-level target labels. We show that our model achieves competitive results to other state-of-the-art approaches which use transcribed text for the task of predicting psychotherapy-relevant behavior codes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从心理治疗对话口语中预测行为代码的终结-2 端学习。

口语理解任务通常依赖于涉及复杂处理模块的流水线，如语音活动检测、说话者日记化和自动语音识别（ASR）。我们提出了一个新颖的框架，可直接从语音特征预测语句级标签，从而消除了对首次生成转录和无转录行为编码的依赖。我们的分类器使用预训练的 Speech-2-Vector 编码器作为瓶颈，从语音特征生成词级表示。这种预先训练好的编码器通过类似于 Word2Vec 的目标来学习对单词的语音特征进行编码。我们提出的方法仅使用语音特征和单词分段信息来预测口语语段级目标标签。我们的研究表明，我们的模型与其他使用转录文本预测心理治疗相关行为代码的先进方法相比，取得了具有竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the conference. Association for Computational Linguistics. Meeting

自引率

0.00%

发文量