首页 > 最新文献

Computer Speech and Language最新文献

英文 中文
Enhancing analysis of diadochokinetic speech using deep neural networks 利用深度神经网络加强对双声道语音的分析
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-02 DOI: 10.1016/j.csl.2024.101715

Diadochokinetic speech tasks (DDK) involve the repetitive production of consonant-vowel syllables. These tasks are useful in detecting impairments, differential diagnosis, and monitoring progress in speech-motor impairments. However, manual analysis of those tasks is time-consuming, subjective, and provides only a rough picture of speech. This paper presents several deep neural network models working on the raw waveform for the automatic segmentation of stop consonants and vowels from unannotated and untranscribed speech. A deep encoder serves as a features extractor module, replacing conventional signal processing features. In this context, diverse deep learning architectures, such as convolutional neural networks (CNNs) and large self-supervised models like HuBERT, are applied for the extraction process. A decoder model uses derived embeddings to identify frame types. Consequently, the paper studies diverse deep architectures, ranging from linear layers, LSTM, CNN, and transformers. These architectures are assessed for their ability to detect speech rate, sound duration, and boundary locations on a dataset of healthy individuals and an unseen dataset of older individuals with Parkinson’s Disease. The results reveal that an LSTM model performs better than all other models on both datasets and is comparable to trained human annotators.

声动力言语任务(DDK)涉及辅音-元音音节的重复发音。这些任务有助于检测言语运动障碍、鉴别诊断和监测进展。然而,对这些任务进行人工分析既费时又主观,而且只能提供一个粗略的语音图像。本文介绍了几种深度神经网络模型,这些模型可处理原始波形,用于自动分割未注释和未转录语音中的停止辅音和元音。深度编码器可作为特征提取模块,取代传统的信号处理特征。在这种情况下,不同的深度学习架构,如卷积神经网络(CNN)和大型自监督模型(如 HuBERT),被应用于提取过程。解码器模型使用衍生嵌入来识别帧类型。因此,本文研究了各种深度架构,包括线性层、LSTM、CNN 和变换器。本文评估了这些架构在健康人数据集和帕金森病老年患者未见数据集上检测语音速率、声音持续时间和边界位置的能力。结果表明,在这两个数据集上,LSTM 模型的表现优于所有其他模型,并可与训练有素的人类标注者相媲美。
{"title":"Enhancing analysis of diadochokinetic speech using deep neural networks","authors":"","doi":"10.1016/j.csl.2024.101715","DOIUrl":"10.1016/j.csl.2024.101715","url":null,"abstract":"<div><p>Diadochokinetic speech tasks (DDK) involve the repetitive production of consonant-vowel syllables. These tasks are useful in detecting impairments, differential diagnosis, and monitoring progress in speech-motor impairments. However, manual analysis of those tasks is time-consuming, subjective, and provides only a rough picture of speech. This paper presents several deep neural network models working on the raw waveform for the automatic segmentation of stop consonants and vowels from unannotated and untranscribed speech. A deep encoder serves as a features extractor module, replacing conventional signal processing features. In this context, diverse deep learning architectures, such as convolutional neural networks (CNNs) and large self-supervised models like HuBERT, are applied for the extraction process. A decoder model uses derived embeddings to identify frame types. Consequently, the paper studies diverse deep architectures, ranging from linear layers, LSTM, CNN, and transformers. These architectures are assessed for their ability to detect speech rate, sound duration, and boundary locations on a dataset of healthy individuals and an unseen dataset of older individuals with Parkinson’s Disease. The results reveal that an LSTM model performs better than all other models on both datasets and is comparable to trained human annotators.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copiously Quote Classics: Improving Chinese Poetry Generation with historical allusion knowledge 大量引用经典:用历史典故知识提高中国诗歌创作水平
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-30 DOI: 10.1016/j.csl.2024.101708

Integrating allusions into poems is an advanced form of human poetry writing, which could clearly express the author’s thoughts and arouse the resonance of readers. However, existing poetry generation works mainly focus on improving the coherence and fluency of poetry, while generating poems with allusion knowledge is rarely considered. To solve this issue, we propose an Allusion-aware Chinese Poetry Generation (ACPG) framework in this study. Concretely, we first release an Allusion-Enriched Poetry (AEP) dataset by linking poems with historical allusions, which might enable a new research direction for poetry generation. Based on this dataset, we design a three-stage learning mechanism to encourage the training stage under a low-resource setting, which can effectively exploit the knowledge of large-scale poetry and allusion data to generate informative allusive poems. Extensive experiments demonstrate the effectiveness of ACPG among a series of proposed baselines. Moreover, the proposed ACPG framework can also be applied to lyrics generation or other controlled text generation tasks, which can incorporate allusion knowledge into the generated results and enhance the meaning and quality of the texts.

将典故融入诗歌是人类诗歌写作的一种高级形式,可以清晰地表达作者的思想,引起读者的共鸣。然而,现有的诗歌生成工作主要集中在提高诗歌的连贯性和流畅性上,而很少考虑生成具有典故知识的诗歌。为了解决这个问题,我们在本研究中提出了一个典故感知中文诗歌创作(ACPG)框架。具体来说,我们首先发布了一个典故丰富诗歌(AEP)数据集,将诗歌与历史典故联系起来,为诗歌生成提供了一个新的研究方向。在此数据集的基础上,我们设计了一种三阶段学习机制,鼓励在低资源环境下的训练阶段,有效利用大规模诗歌和典故数据的知识,生成信息丰富的典故诗。广泛的实验证明了 ACPG 在一系列拟议基线中的有效性。此外,所提出的 ACPG 框架还可应用于歌词生成或其他受控文本生成任务,从而将典故知识纳入生成结果,增强文本的意义和质量。
{"title":"Copiously Quote Classics: Improving Chinese Poetry Generation with historical allusion knowledge","authors":"","doi":"10.1016/j.csl.2024.101708","DOIUrl":"10.1016/j.csl.2024.101708","url":null,"abstract":"<div><p>Integrating allusions into poems is an advanced form of human poetry writing, which could clearly express the author’s thoughts and arouse the resonance of readers. However, existing poetry generation works mainly focus on improving the coherence and fluency of poetry, while generating poems with allusion knowledge is rarely considered. To solve this issue, we propose an <strong>A</strong>llusion-aware <strong>C</strong>hinese <strong>P</strong>oetry <strong>G</strong>eneration (ACPG) framework in this study. Concretely, we first release an <strong>A</strong>llusion-<strong>E</strong>nriched <strong>P</strong>oetry (AEP) dataset by linking poems with historical allusions, which might enable a new research direction for poetry generation. Based on this dataset, we design a three-stage learning mechanism to encourage the training stage under a low-resource setting, which can effectively exploit the knowledge of large-scale poetry and allusion data to generate informative allusive poems. Extensive experiments demonstrate the effectiveness of ACPG among a series of proposed baselines. Moreover, the proposed ACPG framework can also be applied to lyrics generation or other controlled text generation tasks, which can incorporate allusion knowledge into the generated results and enhance the meaning and quality of the texts.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Significance of chirp MFCC as a feature in speech and audio applications 啁啾 MFCC 作为语音和音频应用特征的意义
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.csl.2024.101713

A novel feature, based on the chirp z-transform, that offers an improved representation of the underlying true spectrum is proposed. This feature, the chirp MFCC, is derived by computing the Mel frequency cepstral coefficients from the chirp magnitude spectrum, instead of the Fourier transform magnitude spectrum. The theoretical foundations for the proposal, and the experimental validation using product of likelihood Gaussians, to show the improved class separation offered by the proposed chirp MFCC, when compared with basic MFCC are discussed. Further, real world evaluation of the feature is performed using three diverse tasks, namely, speech–music classification, speaker identification, and speech commands recognition. It is shown in all three tasks that the proposed chirp MFCC offers considerable improvements.

我们提出了一种基于啁啾z-变换的新特征,它能更好地表示潜在的真实频谱。这一特征,即啁啾 MFCC,是通过计算啁啾幅度频谱而不是傅里叶变换幅度频谱中的梅尔频率共振系数得出的。本文讨论了这一提议的理论基础,并使用似然高斯积进行了实验验证,以显示与基本 MFCC 相比,所提议的啁啾 MFCC 可提供更好的类别分离。此外,还使用三种不同的任务对该特征进行了实际评估,即语音音乐分类、说话人识别和语音命令识别。结果表明,在所有三个任务中,所提出的啁啾 MFCC 都有相当大的改进。
{"title":"Significance of chirp MFCC as a feature in speech and audio applications","authors":"","doi":"10.1016/j.csl.2024.101713","DOIUrl":"10.1016/j.csl.2024.101713","url":null,"abstract":"<div><p>A novel feature, based on the chirp z-transform, that offers an improved representation of the underlying true spectrum is proposed. This feature, the chirp MFCC, is derived by computing the Mel frequency cepstral coefficients from the chirp magnitude spectrum, instead of the Fourier transform magnitude spectrum. The theoretical foundations for the proposal, and the experimental validation using product of likelihood Gaussians, to show the improved class separation offered by the proposed chirp MFCC, when compared with basic MFCC are discussed. Further, real world evaluation of the feature is performed using three diverse tasks, namely, speech–music classification, speaker identification, and speech commands recognition. It is shown in all three tasks that the proposed chirp MFCC offers considerable improvements.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000962/pdfft?md5=9eea65049758593f74e943bfcd89ac3f&pid=1-s2.0-S0885230824000962-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial disfluency detection, uh no, disfluency generation for the masses 人工失言检测,呃,不,是为大众生成失言
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-17 DOI: 10.1016/j.csl.2024.101711

Existing approaches for disfluency detection typically require the existence of large annotated datasets. However, current datasets for this task are limited, suffer from class imbalance, and lack some types of disfluencies that are encountered in real-world scenarios. At the same time, augmentation techniques for disfluency detection are not able to model complex types of disfluencies. This limits such approaches to only performing pre-training since the generated data are not indicative of disfluencies that occur in real scenarios and, as a result, cannot be directly used for training disfluency detection models, as we experimentally demonstrate. This imposes significant constraints on the usefulness of such approaches in practice since real disfluencies still have to be collected in order to train the models. In this work, we propose Large-scale ARtificial Disfluency Generation (LARD), a method for automatically generating artificial disfluencies, and more specifically repairs, from fluent text. Unlike existing augmentation techniques, LARD can simulate all the different and complex types of disfluencies. In addition, it incorporates contextual embeddings into the disfluency generation to produce realistic, context-aware artificial disfluencies. LARD can be used effectively for training disfluency detection models, bypassing the requirement of annotated disfluent data. Our empirical evaluation shows that LARD outperforms existing rule-based augmentation methods and increases the accuracy of existing disfluency detectors. In addition, experiments demonstrate that the proposed method can be effectively used in a low-resource setup.

现有的不流畅语检测方法通常需要大型注释数据集。然而,目前用于这项任务的数据集很有限,存在类不平衡的问题,而且缺乏在真实世界场景中遇到的某些类型的不流畅。同时,用于不流畅检测的增强技术也无法对复杂类型的不流畅进行建模。这就限制了这些方法只能进行预训练,因为生成的数据并不能反映真实场景中出现的不流畅现象,因此不能直接用于训练不流畅检测模型,我们的实验证明了这一点。这对此类方法在实践中的实用性造成了很大的限制,因为要训练模型,还必须收集真实的不流利现象。在这项工作中,我们提出了大规模人工断句生成(LARD),这是一种从流畅文本中自动生成人工断句(更具体地说是修复)的方法。与现有的增强技术不同,LARD 可以模拟所有不同和复杂类型的不流畅语句。此外,它还将上下文嵌入到不流畅语句的生成过程中,从而生成逼真的、具有上下文感知能力的人工不流畅语句。LARD 可有效地用于训练不流利语检测模型,从而绕过了对注释不流利语数据的要求。我们的实证评估表明,LARD 优于现有的基于规则的增强方法,并提高了现有不流利语检测器的准确性。此外,实验还证明,所提出的方法可以在低资源环境下有效使用。
{"title":"Artificial disfluency detection, uh no, disfluency generation for the masses","authors":"","doi":"10.1016/j.csl.2024.101711","DOIUrl":"10.1016/j.csl.2024.101711","url":null,"abstract":"<div><p>Existing approaches for disfluency detection typically require the existence of large annotated datasets. However, current datasets for this task are limited, suffer from class imbalance, and lack some types of disfluencies that are encountered in real-world scenarios. At the same time, augmentation techniques for disfluency detection are not able to model complex types of disfluencies. This limits such approaches to only performing pre-training since the generated data are not indicative of disfluencies that occur in real scenarios and, as a result, cannot be directly used for training disfluency detection models, as we experimentally demonstrate. This imposes significant constraints on the usefulness of such approaches in practice since real disfluencies still have to be collected in order to train the models. In this work, we propose Large-scale ARtificial Disfluency Generation (LARD), a method for automatically generating artificial disfluencies, and more specifically repairs, from fluent text. Unlike existing augmentation techniques, LARD can simulate all the different and complex types of disfluencies. In addition, it incorporates contextual embeddings into the disfluency generation to produce realistic, context-aware artificial disfluencies. LARD can be used effectively for training disfluency detection models, bypassing the requirement of annotated disfluent data. Our empirical evaluation shows that LARD outperforms existing rule-based augmentation methods and increases the accuracy of existing disfluency detectors. In addition, experiments demonstrate that the proposed method can be effectively used in a low-resource setup.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000949/pdfft?md5=3e3442312f5819775b9ad09e131a9dd3&pid=1-s2.0-S0885230824000949-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep multi-task learning based detection of correlated mental disorders using audio modality 基于深度多任务学习的音频模式相关精神障碍检测
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-13 DOI: 10.1016/j.csl.2024.101710

The existence of correlation among mental disorders is a well-known phenomenon. Multi-task learning (MTL) has been reported to yield enhanced detection performance of a targeted mental disorder by leveraging its correlation with other related mental disorders, mainly in textual and visual modalities. The validation of the same on audio modality is yet to be explored. In this study, we explore homogeneous and heterogeneous MTL paradigms for detecting two correlated mental disorders, namely major depressive disorder (MDD) and post-traumatic stress disorder (PTSD), on a publicly available audio dataset. The detection of both disorders is interchangeably employed as an auxiliary task when the other is the main task. In addition, a few other tasks are employed as auxiliary tasks. The results show that both MTL paradigms, implemented using two considered deep-learning models, outperformed the corresponding single-task learning (STL). The best relative improvement in the detection performance of MDD and PTSD is found to be 29.9% and 28.8%, respectively. Furthermore, we analyzed the cross-corpus generalization of MTL using two distinct datasets that involve MDD/PTSD instances. The results indicate that the generalizability of MTL is significantly superior to that of STL. The best relative increment in the cross-corpus generalization performance of MDD and PTSD detection is found to be 25.0% and 56.5%, respectively.

精神障碍之间存在相关性是一个众所周知的现象。据报道,多任务学习(MTL)可利用目标精神障碍与其他相关精神障碍的相关性(主要是在文本和视觉模式中),提高目标精神障碍的检测性能。同样的方法在音频模式上的验证还有待探索。在本研究中,我们探索了同质和异质 MTL 范式,用于在公开可用的音频数据集上检测两种相关精神障碍,即重度抑郁障碍(MDD)和创伤后应激障碍(PTSD)。当一项任务是主要任务时,这两种疾病的检测则作为辅助任务交替使用。此外,还有一些其他任务也被用作辅助任务。结果表明,使用两种深度学习模型实现的 MTL 范式都优于相应的单任务学习(STL)。MDD 和 PTSD 检测性能的最佳相对改善率分别为 29.9% 和 28.8%。此外,我们还使用涉及 MDD/PTSD 实例的两个不同数据集分析了 MTL 的跨语料库泛化能力。结果表明,MTL 的泛化能力明显优于 STL。MDD 和 PTSD 检测的跨语料库泛化性能的最佳相对增量分别为 25.0% 和 56.5%。
{"title":"Deep multi-task learning based detection of correlated mental disorders using audio modality","authors":"","doi":"10.1016/j.csl.2024.101710","DOIUrl":"10.1016/j.csl.2024.101710","url":null,"abstract":"<div><p>The existence of correlation among mental disorders is a well-known phenomenon. Multi-task learning (MTL) has been reported to yield enhanced detection performance of a targeted mental disorder by leveraging its correlation with other related mental disorders, mainly in textual and visual modalities. The validation of the same on audio modality is yet to be explored. In this study, we explore homogeneous and heterogeneous MTL paradigms for detecting two correlated mental disorders, namely major depressive disorder (MDD) and post-traumatic stress disorder (PTSD), on a publicly available audio dataset. The detection of both disorders is interchangeably employed as an auxiliary task when the other is the main task. In addition, a few other tasks are employed as auxiliary tasks. The results show that both MTL paradigms, implemented using two considered deep-learning models, outperformed the corresponding single-task learning (STL). The best relative improvement in the detection performance of MDD and PTSD is found to be 29.9% and 28.8%, respectively. Furthermore, we analyzed the cross-corpus generalization of MTL using two distinct datasets that involve MDD/PTSD instances. The results indicate that the generalizability of MTL is significantly superior to that of STL. The best relative increment in the cross-corpus generalization performance of MDD and PTSD detection is found to be 25.0% and 56.5%, respectively.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000937/pdfft?md5=abe8ab646f019a4cea29fbd4acdd6557&pid=1-s2.0-S0885230824000937-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142011678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive feature extraction for entity relation extraction 用于实体关系提取的自适应特征提取
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-13 DOI: 10.1016/j.csl.2024.101712

Effective capturing of semantic dependencies within sentences is pivotal to support relation extraction. However, challenges such as feature sparsity, and the complexity of identifying the structure of target entity pairs brought by the traditional methods of feature extraction pose significant obstacles for relation extraction. Existing methods that rely on combined features or recurrent networks also face limitations, such as over-reliance on prior knowledge or the gradient vanishing problem. To address these limitations, we propose an Adaptive Feature Extraction (AFE) method, combining neural networks with feature engineering to capture high-order abstract and long-distance semantic dependencies. Our approach extracts atomic features from sentences, maps them into distributed representations, and categorizes these representations into multiple mixed features through adaptive combination, setting it apart from other methods. The proposed AFE-based model uses four different convolutional layers to facilitate feature learning and weighting from the adaptive feature representations, thereby enhancing the discriminative power of deep networks for relation extraction. Experimental results on the English datasets ACE05 English, SciERC, and the Chinese datasets ACE05 Chinese, and CLTC(SanWen) demonstrated the superiority of our method, the F1 scores were improved by 4.16%, 3.99%, 0.82%, and 1.60%, respectively. In summary, our AFE method provides a flexible, and effective solution to some challenges in cross-domain and cross-language relation extraction.

有效捕捉句子中的语义依赖关系对于支持关系提取至关重要。然而,传统的特征提取方法所带来的特征稀疏性和识别目标实体对结构的复杂性等挑战给关系提取带来了巨大障碍。依靠组合特征或递归网络的现有方法也面临着局限性,如过度依赖先验知识或梯度消失问题。为了解决这些局限性,我们提出了一种自适应特征提取(AFE)方法,将神经网络与特征工程相结合,以捕捉高阶抽象和长距离语义依赖关系。我们的方法从句子中提取原子特征,将其映射到分布式表征中,并通过自适应组合将这些表征归类为多种混合特征,从而使其有别于其他方法。所提出的基于 AFE 的模型使用四个不同的卷积层来促进自适应特征表征的特征学习和加权,从而增强了深度网络对关系提取的判别能力。在英文数据集 ACE05 English、SciERC 和中文数据集 ACE05 Chinese、CLTC(SanWen) 上的实验结果表明了我们的方法的优越性,F1 分数分别提高了 4.16%、3.99%、0.82% 和 1.60%。总之,我们的 AFE 方法为跨领域和跨语言关系提取中的一些难题提供了灵活有效的解决方案。
{"title":"Adaptive feature extraction for entity relation extraction","authors":"","doi":"10.1016/j.csl.2024.101712","DOIUrl":"10.1016/j.csl.2024.101712","url":null,"abstract":"<div><p>Effective capturing of semantic dependencies within sentences is pivotal to support relation extraction. However, challenges such as feature sparsity, and the complexity of identifying the structure of target entity pairs brought by the traditional methods of feature extraction pose significant obstacles for relation extraction. Existing methods that rely on combined features or recurrent networks also face limitations, such as over-reliance on prior knowledge or the gradient vanishing problem. To address these limitations, we propose an Adaptive Feature Extraction (AFE) method, combining neural networks with feature engineering to capture high-order abstract and long-distance semantic dependencies. Our approach extracts atomic features from sentences, maps them into distributed representations, and categorizes these representations into multiple mixed features through adaptive combination, setting it apart from other methods. The proposed AFE-based model uses four different convolutional layers to facilitate feature learning and weighting from the adaptive feature representations, thereby enhancing the discriminative power of deep networks for relation extraction. Experimental results on the English datasets ACE05 English, SciERC, and the Chinese datasets ACE05 Chinese, and CLTC(SanWen) demonstrated the superiority of our method, the F1 scores were improved by 4.16%, 3.99%, 0.82%, and 1.60%, respectively. In summary, our AFE method provides a flexible, and effective solution to some challenges in cross-domain and cross-language relation extraction.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000950/pdfft?md5=adb04036e83a59bb4a0206084d42c6c1&pid=1-s2.0-S0885230824000950-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A neural network approach for speech enhancement and noise-robust bandwidth extension 用于语音增强和噪声带宽扩展的神经网络方法
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-13 DOI: 10.1016/j.csl.2024.101709

When processing noisy utterances with varying frequency bandwidths using an enhancement model, the effective bandwidth of the resulting enhanced speech often remains unchanged. However, high-frequency components are crucial for perceived audio quality, underscoring the need for noise-robust bandwidth extension capabilities in speech enhancement networks. In this study, we addressed this challenge by proposing a novel network architecture and loss function based on the CAUNet, which is a state-of-the-art speech enhancement method. We introduced a multi-scale loss and implemented a coordinate embedded upsampling block to facilitate bandwidth extension while maintaining the ability of speech enhancement. Additionally, we proposed a gradient loss function to promote the neural network’s convergence, leading to significant performance improvements. Our experimental results validate these modifications and clearly demonstrate the superiority of our approach over competing methods.

当使用增强模型处理具有不同频率带宽的噪声语音时,所产生的增强语音的有效带宽往往保持不变。然而,高频成分对感知音频质量至关重要,这就强调了语音增强网络需要具有抗噪带宽扩展能力。在本研究中,我们提出了一种基于 CAUNet 的新型网络架构和损失函数,以应对这一挑战,CAUNet 是一种最先进的语音增强方法。我们引入了多尺度损失,并实施了坐标嵌入式上采样块,以促进带宽扩展,同时保持语音增强能力。此外,我们还提出了梯度损失函数来促进神经网络的收敛,从而显著提高了性能。我们的实验结果验证了这些修改,并清楚地证明了我们的方法优于其他竞争方法。
{"title":"A neural network approach for speech enhancement and noise-robust bandwidth extension","authors":"","doi":"10.1016/j.csl.2024.101709","DOIUrl":"10.1016/j.csl.2024.101709","url":null,"abstract":"<div><p>When processing noisy utterances with varying frequency bandwidths using an enhancement model, the effective bandwidth of the resulting enhanced speech often remains unchanged. However, high-frequency components are crucial for perceived audio quality, underscoring the need for noise-robust bandwidth extension capabilities in speech enhancement networks. In this study, we addressed this challenge by proposing a novel network architecture and loss function based on the CAUNet, which is a state-of-the-art speech enhancement method. We introduced a multi-scale loss and implemented a coordinate embedded upsampling block to facilitate bandwidth extension while maintaining the ability of speech enhancement. Additionally, we proposed a gradient loss function to promote the neural network’s convergence, leading to significant performance improvements. Our experimental results validate these modifications and clearly demonstrate the superiority of our approach over competing methods.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000925/pdfft?md5=3c5e79967537c7a56d567c957963e01b&pid=1-s2.0-S0885230824000925-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Syntax-controlled paraphrases generation with VAE and multi-task learning 利用 VAE 和多任务学习生成受语法控制的转述
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-08 DOI: 10.1016/j.csl.2024.101705

Paraphrase generation is an important method for augmenting text data, which has a crucial role in Natural Language Generation (NLG). However, existing methods lack the ability to capture the semantic representation of input sentences and the syntactic structure of exemplars, which can easily lead to problems such as redundant content, semantic inaccuracies, and poor diversity. To tackle these challenges, we propose a Syntax-Controlled Paraphrase Generator (SCPG), which utilizes attention networks and VAE-based hidden variables to model the semantics of input sentences and the syntax of exemplars. In addition, in order to achieve controllability of the target paraphrase structure, we propose a method for learning semantic and syntactic representations based on multi-task learning, and successfully integrate the two through a gating mechanism. Extensive experimental results show that SCPG achieves SOTA results in terms of both semantic consistency and syntactic controllability, and is able to make a better trade-off between preserving semantics and novelty of sentence structure.

意译生成是增强文本数据的一种重要方法,在自然语言生成(NLG)中发挥着至关重要的作用。然而,现有的方法无法捕捉输入句子的语义表征和示例的句法结构,这很容易导致冗余内容、语义不准确和多样性差等问题。为了应对这些挑战,我们提出了语法控制仿句生成器(SCPG),它利用注意力网络和基于 VAE 的隐藏变量来模拟输入句子的语义和示例的语法。此外,为了实现目标转述结构的可控性,我们提出了一种基于多任务学习的语义和句法表征学习方法,并通过门控机制成功地将二者整合在一起。大量实验结果表明,SCPG 在语义一致性和句法可控性方面都达到了 SOTA 的结果,并能在保留语义和句子结构新颖性之间做出更好的权衡。
{"title":"Syntax-controlled paraphrases generation with VAE and multi-task learning","authors":"","doi":"10.1016/j.csl.2024.101705","DOIUrl":"10.1016/j.csl.2024.101705","url":null,"abstract":"<div><p>Paraphrase generation is an important method for augmenting text data, which has a crucial role in Natural Language Generation (NLG). However, existing methods lack the ability to capture the semantic representation of input sentences and the syntactic structure of exemplars, which can easily lead to problems such as redundant content, semantic inaccuracies, and poor diversity. To tackle these challenges, we propose a Syntax-Controlled Paraphrase Generator (SCPG), which utilizes attention networks and VAE-based hidden variables to model the semantics of input sentences and the syntax of exemplars. In addition, in order to achieve controllability of the target paraphrase structure, we propose a method for learning semantic and syntactic representations based on multi-task learning, and successfully integrate the two through a gating mechanism. Extensive experimental results show that SCPG achieves SOTA results in terms of both semantic consistency and syntactic controllability, and is able to make a better trade-off between preserving semantics and novelty of sentence structure.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000883/pdfft?md5=a172f9652be80ec2012b298f58353215&pid=1-s2.0-S0885230824000883-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142011679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages 基于转换器的孟加拉语和资源匮乏的印度语言拼写错误纠正框架
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-07 DOI: 10.1016/j.csl.2024.101703

Spelling error correction is the task of identifying and rectifying misspelled words in texts. It is a potential and active research topic in Natural Language Processing because of numerous applications in human language understanding. The phonetically or visually similar yet semantically distinct characters make it an arduous task in any language. Earlier efforts on spelling error correction in Bangla and resource-scarce Indic languages focused on rule-based, statistical, and machine learning-based methods which we found rather inefficient. In particular, machine learning-based approaches, which exhibit superior performance to rule-based and statistical methods, are ineffective as they correct each character regardless of its appropriateness. In this paper, we propose a novel detector-purificator-corrector framework, DPCSpell based on denoising transformers by addressing previous issues. In addition to that, we present a method for large-scale corpus creation from scratch which in turn resolves the resource limitation problem of any left-to-right scripted language. The empirical outcomes demonstrate the effectiveness of our approach, which outperforms previous state-of-the-art methods by attaining an exact match (EM) score of 94.78%, a precision score of 0.9487, a recall score of 0.9478, an f1 score of 0.948, an f0.5 score of 0.9483, and a modified accuracy (MA) score of 95.16% for Bangla spelling error correction. The models and corpus are publicly available at https://tinyurl.com/DPCSpell.

拼写错误纠正是一项识别和纠正文本中拼写错误单词的任务。由于在人类语言理解中的大量应用,它是自然语言处理中一个潜在而活跃的研究课题。在任何语言中,语音或视觉相似但语义不同的字符都是一项艰巨的任务。早期在孟加拉语和资源稀缺的印度语言中进行拼写错误纠正的工作主要集中在基于规则、统计和机器学习的方法上,但我们发现这些方法效率很低。尤其是基于机器学习的方法,虽然比基于规则和统计的方法表现出更优越的性能,但其效果并不好,因为它们会不顾每个字符是否合适而对其进行纠正。在本文中,我们针对之前存在的问题,提出了一种基于去噪变换器的新型检测器-净化器-校正器框架 DPCSpell。此外,我们还提出了一种从零开始创建大规模语料库的方法,从而解决了任何从左到右脚本语言的资源限制问题。实证结果证明了我们的方法的有效性,在孟加拉语拼写错误纠正方面,我们的方法优于之前的先进方法,精确匹配(EM)得分 94.78%,精确度得分 0.9487,召回得分 0.9478,f1 得分 0.948,f0.5 得分 0.9483,修正准确度(MA)得分 95.16%。有关模型和语料库可在以下网址公开获取。
{"title":"A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages","authors":"","doi":"10.1016/j.csl.2024.101703","DOIUrl":"10.1016/j.csl.2024.101703","url":null,"abstract":"<div><p>Spelling error correction is the task of identifying and rectifying misspelled words in texts. It is a potential and active research topic in Natural Language Processing because of numerous applications in human language understanding. The phonetically or visually similar yet semantically distinct characters make it an arduous task in any language. Earlier efforts on spelling error correction in Bangla and resource-scarce Indic languages focused on rule-based, statistical, and machine learning-based methods which we found rather inefficient. In particular, machine learning-based approaches, which exhibit superior performance to rule-based and statistical methods, are ineffective as they correct each character regardless of its appropriateness. In this paper, we propose a novel detector-purificator-corrector framework, DPCSpell based on denoising transformers by addressing previous issues. In addition to that, we present a method for large-scale corpus creation from scratch which in turn resolves the resource limitation problem of any left-to-right scripted language. The empirical outcomes demonstrate the effectiveness of our approach, which outperforms previous state-of-the-art methods by attaining an exact match (EM) score of 94.78%, a precision score of 0.9487, a recall score of 0.9478, an f1 score of 0.948, an f0.5 score of 0.9483, and a modified accuracy (MA) score of 95.16% for Bangla spelling error correction. The models and corpus are publicly available at <span><span>https://tinyurl.com/DPCSpell</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S088523082400086X/pdfft?md5=42e971181da3ed460a728ce6126888c9&pid=1-s2.0-S088523082400086X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge-aware audio-grounded generative slot filling for limited annotated data 针对有限注释数据的知识感知音频生成槽填充
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1016/j.csl.2024.101707

Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by (1) framing it as a text generation task, (2) grounding text generation additionally in the audio modality, and (3) conditioning on available external knowledge (e.g. a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.

为面向任务的对话(ToD)系统手动标注细粒度的槽值标签是一项既费钱又费时的工作。这就促使人们研究利用有限的标注数据进行时隙填充的方法。此外,目前有关 ToD 的大部分研究工作都是以文本作为输入模式,而忽略了在处理口语时不完善的自动语音识别(ASR)所带来的额外挑战。在这项工作中,我们提出了一个知识感知音频-地基生成式插槽填充框架(称为 KA2G),该框架专注于使用语音输入进行 ToD 的少镜头和零镜头插槽填充。KA2G 通过(1)将语音 ToD 定义为文本生成任务,(2)将文本生成额外建立在音频模态上,以及(3)以可用的外部知识(预定义的可能槽值列表)为条件,实现了基于语音的 ToD 的稳健且数据高效的槽填充。我们的研究表明,在 KA2G 框架内将两种模态结合在一起可提高抗 ASR 错误的鲁棒性。此外,KA2G 中的知识感知时隙值生成器是通过指针生成器机制实现的,尤其有利于少次学习和零次学习。在基于标准语音的单匝 SLURP 数据集和从商业 ToD 系统中提取的多匝数据集上进行的实验显示,KA2G 比之前的研究成果具有更强、更稳定的优势,尤其是在少匝和零匝设置中。
{"title":"Knowledge-aware audio-grounded generative slot filling for limited annotated data","authors":"","doi":"10.1016/j.csl.2024.101707","DOIUrl":"10.1016/j.csl.2024.101707","url":null,"abstract":"<div><p>Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by (1) framing it as a text generation task, (2) grounding text generation additionally in the audio modality, and (3) conditioning on available external knowledge (<em>e.g.</em> a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000901/pdfft?md5=f629f96f3e24fa1b58c6bf9d7f53386f&pid=1-s2.0-S0885230824000901-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Speech and Language
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1