An Effective Hierarchical Graph Attention Network Modeling Approach for Pronunciation Assessment

IF 4.1 2区 计算机科学 Q1 ACOUSTICS IEEE/ACM Transactions on Audio, Speech, and Language Processing Pub Date : 2024-08-26 DOI:10.1109/TASLP.2024.3449111
Bi-Cheng Yan;Berlin Chen
{"title":"An Effective Hierarchical Graph Attention Network Modeling Approach for Pronunciation Assessment","authors":"Bi-Cheng Yan;Berlin Chen","doi":"10.1109/TASLP.2024.3449111","DOIUrl":null,"url":null,"abstract":"Automatic pronunciation assessment (APA) manages to quantify second language (L2) learners’ pronunciation proficiency in a target language by providing fine-grained feedback with multiple aspect scores (e.g., accuracy, fluency, and completeness) at various linguistic levels (i.e., phone, word, and utterance). Most of the existing efforts commonly follow a parallel modeling framework, which takes a sequence of phone-level pronunciation feature embeddings of a learner's utterance as input and then predicts multiple aspect scores across various linguistic levels. However, these approaches neither take the hierarchy of linguistic units into account nor consider the relatedness among the pronunciation aspects in an explicit manner. In light of this, we put forward an effective modeling approach for APA, termed HierGAT, which is grounded on a hierarchical graph attention network. Our approach facilitates hierarchical modeling of the input utterance as a heterogeneous graph that contains linguistic nodes at various levels of granularity. On top of the tactfully designed hierarchical graph message passing mechanism, intricate interdependencies within and across different linguistic levels are encapsulated and the language hierarchy of an utterance is factored in as well. Furthermore, we also design a novel aspect attention module to encode relatedness among aspects. To our knowledge, we are the first to introduce multiple types of linguistic nodes into graph-based neural networks for APA and perform a comprehensive qualitative analysis to investigate their merits. A series of experiments conducted on the speechocean762 benchmark dataset suggests the feasibility and effectiveness of our approach in relation to several competitive baselines.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3974-3985"},"PeriodicalIF":4.1000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10648884/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Automatic pronunciation assessment (APA) manages to quantify second language (L2) learners’ pronunciation proficiency in a target language by providing fine-grained feedback with multiple aspect scores (e.g., accuracy, fluency, and completeness) at various linguistic levels (i.e., phone, word, and utterance). Most of the existing efforts commonly follow a parallel modeling framework, which takes a sequence of phone-level pronunciation feature embeddings of a learner's utterance as input and then predicts multiple aspect scores across various linguistic levels. However, these approaches neither take the hierarchy of linguistic units into account nor consider the relatedness among the pronunciation aspects in an explicit manner. In light of this, we put forward an effective modeling approach for APA, termed HierGAT, which is grounded on a hierarchical graph attention network. Our approach facilitates hierarchical modeling of the input utterance as a heterogeneous graph that contains linguistic nodes at various levels of granularity. On top of the tactfully designed hierarchical graph message passing mechanism, intricate interdependencies within and across different linguistic levels are encapsulated and the language hierarchy of an utterance is factored in as well. Furthermore, we also design a novel aspect attention module to encode relatedness among aspects. To our knowledge, we are the first to introduce multiple types of linguistic nodes into graph-based neural networks for APA and perform a comprehensive qualitative analysis to investigate their merits. A series of experiments conducted on the speechocean762 benchmark dataset suggests the feasibility and effectiveness of our approach in relation to several competitive baselines.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
发音评估的有效层次图注意网络建模方法
自动发音评估(APA)通过在不同语言层面(即电话、单词和语篇)提供多方面评分(如准确度、流利度和完整性)的细粒度反馈来量化第二语言(L2)学习者的目标语言发音水平。现有的大多数方法通常采用并行建模框架,将学习者语篇的电话级发音特征嵌入序列作为输入,然后预测不同语言级别的多个方面得分。然而,这些方法既没有考虑语言单位的层次结构,也没有明确考虑发音方面之间的关联性。有鉴于此,我们提出了一种有效的 APA 建模方法,称为 HierGAT,它以分层图注意网络为基础。我们的方法有利于将输入语篇作为一个异构图进行分层建模,该图包含不同粒度的语言节点。在巧妙设计的分层图信息传递机制之上,不同语言层次内部和之间错综复杂的相互依赖关系被封装起来,语篇的语言层次结构也被考虑在内。此外,我们还设计了一个新颖的方面关注模块来编码各方面之间的相关性。据我们所知,我们是第一个在基于图的 APA 神经网络中引入多种类型的语言节点,并对其优点进行全面定性分析的人。在 speechocean762 基准数据集上进行的一系列实验表明,我们的方法与几种具有竞争力的基线方法相比是可行和有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
11.30
自引率
11.10%
发文量
217
期刊介绍: The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.
期刊最新文献
Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features FxLMS/F Based Tap Decomposed Adaptive Filter for Decentralized Active Noise Control System MRC-PASCL: A Few-Shot Machine Reading Comprehension Approach via Post-Training and Answer Span-Oriented Contrastive Learning Knowledge-Guided Transformer for Joint Theme and Emotion Classification of Chinese Classical Poetry WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1