Computer Speech and Language最新文献_第5页

Adaptive feature extraction for entity relation extraction 用于实体关系提取的自适应特征提取

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-08-13 DOI: 10.1016/j.csl.2024.101712

Weizhe Yang , Yongbin Qin , Ruizhang Huang , Yanping Chen

Effective capturing of semantic dependencies within sentences is pivotal to support relation extraction. However, challenges such as feature sparsity, and the complexity of identifying the structure of target entity pairs brought by the traditional methods of feature extraction pose significant obstacles for relation extraction. Existing methods that rely on combined features or recurrent networks also face limitations, such as over-reliance on prior knowledge or the gradient vanishing problem. To address these limitations, we propose an Adaptive Feature Extraction (AFE) method, combining neural networks with feature engineering to capture high-order abstract and long-distance semantic dependencies. Our approach extracts atomic features from sentences, maps them into distributed representations, and categorizes these representations into multiple mixed features through adaptive combination, setting it apart from other methods. The proposed AFE-based model uses four different convolutional layers to facilitate feature learning and weighting from the adaptive feature representations, thereby enhancing the discriminative power of deep networks for relation extraction. Experimental results on the English datasets ACE05 English, SciERC, and the Chinese datasets ACE05 Chinese, and CLTC(SanWen) demonstrated the superiority of our method, the F1 scores were improved by 4.16%, 3.99%, 0.82%, and 1.60%, respectively. In summary, our AFE method provides a flexible, and effective solution to some challenges in cross-domain and cross-language relation extraction.

有效捕捉句子中的语义依赖关系对于支持关系提取至关重要。然而，传统的特征提取方法所带来的特征稀疏性和识别目标实体对结构的复杂性等挑战给关系提取带来了巨大障碍。依靠组合特征或递归网络的现有方法也面临着局限性，如过度依赖先验知识或梯度消失问题。为了解决这些局限性，我们提出了一种自适应特征提取（AFE）方法，将神经网络与特征工程相结合，以捕捉高阶抽象和长距离语义依赖关系。我们的方法从句子中提取原子特征，将其映射到分布式表征中，并通过自适应组合将这些表征归类为多种混合特征，从而使其有别于其他方法。所提出的基于 AFE 的模型使用四个不同的卷积层来促进自适应特征表征的特征学习和加权，从而增强了深度网络对关系提取的判别能力。在英文数据集 ACE05 English、SciERC 和中文数据集 ACE05 Chinese、CLTC(SanWen) 上的实验结果表明了我们的方法的优越性，F1 分数分别提高了 4.16%、3.99%、0.82% 和 1.60%。总之，我们的 AFE 方法为跨领域和跨语言关系提取中的一些难题提供了灵活有效的解决方案。

{"title":"Adaptive feature extraction for entity relation extraction","authors":"Weizhe Yang , Yongbin Qin , Ruizhang Huang , Yanping Chen","doi":"10.1016/j.csl.2024.101712","DOIUrl":"10.1016/j.csl.2024.101712","url":null,"abstract":"<div><p>Effective capturing of semantic dependencies within sentences is pivotal to support relation extraction. However, challenges such as feature sparsity, and the complexity of identifying the structure of target entity pairs brought by the traditional methods of feature extraction pose significant obstacles for relation extraction. Existing methods that rely on combined features or recurrent networks also face limitations, such as over-reliance on prior knowledge or the gradient vanishing problem. To address these limitations, we propose an Adaptive Feature Extraction (AFE) method, combining neural networks with feature engineering to capture high-order abstract and long-distance semantic dependencies. Our approach extracts atomic features from sentences, maps them into distributed representations, and categorizes these representations into multiple mixed features through adaptive combination, setting it apart from other methods. The proposed AFE-based model uses four different convolutional layers to facilitate feature learning and weighting from the adaptive feature representations, thereby enhancing the discriminative power of deep networks for relation extraction. Experimental results on the English datasets ACE05 English, SciERC, and the Chinese datasets ACE05 Chinese, and CLTC(SanWen) demonstrated the superiority of our method, the F1 scores were improved by 4.16%, 3.99%, 0.82%, and 1.60%, respectively. In summary, our AFE method provides a flexible, and effective solution to some challenges in cross-domain and cross-language relation extraction.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101712"},"PeriodicalIF":3.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000950/pdfft?md5=adb04036e83a59bb4a0206084d42c6c1&pid=1-s2.0-S0885230824000950-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A neural network approach for speech enhancement and noise-robust bandwidth extension 用于语音增强和噪声带宽扩展的神经网络方法

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-08-13 DOI: 10.1016/j.csl.2024.101709

Xiang Hao , Chenglin Xu , Chen Zhang , Lei Xie

When processing noisy utterances with varying frequency bandwidths using an enhancement model, the effective bandwidth of the resulting enhanced speech often remains unchanged. However, high-frequency components are crucial for perceived audio quality, underscoring the need for noise-robust bandwidth extension capabilities in speech enhancement networks. In this study, we addressed this challenge by proposing a novel network architecture and loss function based on the CAUNet, which is a state-of-the-art speech enhancement method. We introduced a multi-scale loss and implemented a coordinate embedded upsampling block to facilitate bandwidth extension while maintaining the ability of speech enhancement. Additionally, we proposed a gradient loss function to promote the neural network’s convergence, leading to significant performance improvements. Our experimental results validate these modifications and clearly demonstrate the superiority of our approach over competing methods.

当使用增强模型处理具有不同频率带宽的噪声语音时，所产生的增强语音的有效带宽往往保持不变。然而，高频成分对感知音频质量至关重要，这就强调了语音增强网络需要具有抗噪带宽扩展能力。在本研究中，我们提出了一种基于 CAUNet 的新型网络架构和损失函数，以应对这一挑战，CAUNet 是一种最先进的语音增强方法。我们引入了多尺度损失，并实施了坐标嵌入式上采样块，以促进带宽扩展，同时保持语音增强能力。此外，我们还提出了梯度损失函数来促进神经网络的收敛，从而显著提高了性能。我们的实验结果验证了这些修改，并清楚地证明了我们的方法优于其他竞争方法。

{"title":"A neural network approach for speech enhancement and noise-robust bandwidth extension","authors":"Xiang Hao , Chenglin Xu , Chen Zhang , Lei Xie","doi":"10.1016/j.csl.2024.101709","DOIUrl":"10.1016/j.csl.2024.101709","url":null,"abstract":"<div><p>When processing noisy utterances with varying frequency bandwidths using an enhancement model, the effective bandwidth of the resulting enhanced speech often remains unchanged. However, high-frequency components are crucial for perceived audio quality, underscoring the need for noise-robust bandwidth extension capabilities in speech enhancement networks. In this study, we addressed this challenge by proposing a novel network architecture and loss function based on the CAUNet, which is a state-of-the-art speech enhancement method. We introduced a multi-scale loss and implemented a coordinate embedded upsampling block to facilitate bandwidth extension while maintaining the ability of speech enhancement. Additionally, we proposed a gradient loss function to promote the neural network’s convergence, leading to significant performance improvements. Our experimental results validate these modifications and clearly demonstrate the superiority of our approach over competing methods.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101709"},"PeriodicalIF":3.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000925/pdfft?md5=3c5e79967537c7a56d567c957963e01b&pid=1-s2.0-S0885230824000925-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Syntax-controlled paraphrases generation with VAE and multi-task learning 利用 VAE 和多任务学习生成受语法控制的转述

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-08-08 DOI: 10.1016/j.csl.2024.101705

Xiyuan Jia , Zongqing Mao , Zhen Zhang , Qiyun Lv , Xin Wang , Guohua Wu

Paraphrase generation is an important method for augmenting text data, which has a crucial role in Natural Language Generation (NLG). However, existing methods lack the ability to capture the semantic representation of input sentences and the syntactic structure of exemplars, which can easily lead to problems such as redundant content, semantic inaccuracies, and poor diversity. To tackle these challenges, we propose a Syntax-Controlled Paraphrase Generator (SCPG), which utilizes attention networks and VAE-based hidden variables to model the semantics of input sentences and the syntax of exemplars. In addition, in order to achieve controllability of the target paraphrase structure, we propose a method for learning semantic and syntactic representations based on multi-task learning, and successfully integrate the two through a gating mechanism. Extensive experimental results show that SCPG achieves SOTA results in terms of both semantic consistency and syntactic controllability, and is able to make a better trade-off between preserving semantics and novelty of sentence structure.

意译生成是增强文本数据的一种重要方法，在自然语言生成（NLG）中发挥着至关重要的作用。然而，现有的方法无法捕捉输入句子的语义表征和示例的句法结构，这很容易导致冗余内容、语义不准确和多样性差等问题。为了应对这些挑战，我们提出了语法控制仿句生成器（SCPG），它利用注意力网络和基于 VAE 的隐藏变量来模拟输入句子的语义和示例的语法。此外，为了实现目标转述结构的可控性，我们提出了一种基于多任务学习的语义和句法表征学习方法，并通过门控机制成功地将二者整合在一起。大量实验结果表明，SCPG 在语义一致性和句法可控性方面都达到了 SOTA 的结果，并能在保留语义和句子结构新颖性之间做出更好的权衡。

{"title":"Syntax-controlled paraphrases generation with VAE and multi-task learning","authors":"Xiyuan Jia , Zongqing Mao , Zhen Zhang , Qiyun Lv , Xin Wang , Guohua Wu","doi":"10.1016/j.csl.2024.101705","DOIUrl":"10.1016/j.csl.2024.101705","url":null,"abstract":"<div><p>Paraphrase generation is an important method for augmenting text data, which has a crucial role in Natural Language Generation (NLG). However, existing methods lack the ability to capture the semantic representation of input sentences and the syntactic structure of exemplars, which can easily lead to problems such as redundant content, semantic inaccuracies, and poor diversity. To tackle these challenges, we propose a Syntax-Controlled Paraphrase Generator (SCPG), which utilizes attention networks and VAE-based hidden variables to model the semantics of input sentences and the syntax of exemplars. In addition, in order to achieve controllability of the target paraphrase structure, we propose a method for learning semantic and syntactic representations based on multi-task learning, and successfully integrate the two through a gating mechanism. Extensive experimental results show that SCPG achieves SOTA results in terms of both semantic consistency and syntactic controllability, and is able to make a better trade-off between preserving semantics and novelty of sentence structure.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101705"},"PeriodicalIF":3.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000883/pdfft?md5=a172f9652be80ec2012b298f58353215&pid=1-s2.0-S0885230824000883-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142011679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages 基于转换器的孟加拉语和资源匮乏的印度语言拼写错误纠正框架

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-08-07 DOI: 10.1016/j.csl.2024.101703

Mehedi Hasan Bijoy , Nahid Hossain , Salekul Islam , Swakkhar Shatabda

Spelling error correction is the task of identifying and rectifying misspelled words in texts. It is a potential and active research topic in Natural Language Processing because of numerous applications in human language understanding. The phonetically or visually similar yet semantically distinct characters make it an arduous task in any language. Earlier efforts on spelling error correction in Bangla and resource-scarce Indic languages focused on rule-based, statistical, and machine learning-based methods which we found rather inefficient. In particular, machine learning-based approaches, which exhibit superior performance to rule-based and statistical methods, are ineffective as they correct each character regardless of its appropriateness. In this paper, we propose a novel detector-purificator-corrector framework, DPCSpell based on denoising transformers by addressing previous issues. In addition to that, we present a method for large-scale corpus creation from scratch which in turn resolves the resource limitation problem of any left-to-right scripted language. The empirical outcomes demonstrate the effectiveness of our approach, which outperforms previous state-of-the-art methods by attaining an exact match (EM) score of 94.78%, a precision score of 0.9487, a recall score of 0.9478, an f1 score of 0.948, an f0.5 score of 0.9483, and a modified accuracy (MA) score of 95.16% for Bangla spelling error correction. The models and corpus are publicly available at https://tinyurl.com/DPCSpell.

拼写错误纠正是一项识别和纠正文本中拼写错误单词的任务。由于在人类语言理解中的大量应用，它是自然语言处理中一个潜在而活跃的研究课题。在任何语言中，语音或视觉相似但语义不同的字符都是一项艰巨的任务。早期在孟加拉语和资源稀缺的印度语言中进行拼写错误纠正的工作主要集中在基于规则、统计和机器学习的方法上，但我们发现这些方法效率很低。尤其是基于机器学习的方法，虽然比基于规则和统计的方法表现出更优越的性能，但其效果并不好，因为它们会不顾每个字符是否合适而对其进行纠正。在本文中，我们针对之前存在的问题，提出了一种基于去噪变换器的新型检测器-净化器-校正器框架 DPCSpell。此外，我们还提出了一种从零开始创建大规模语料库的方法，从而解决了任何从左到右脚本语言的资源限制问题。实证结果证明了我们的方法的有效性，在孟加拉语拼写错误纠正方面，我们的方法优于之前的先进方法，精确匹配（EM）得分 94.78%，精确度得分 0.9487，召回得分 0.9478，f1 得分 0.948，f0.5 得分 0.9483，修正准确度（MA）得分 95.16%。有关模型和语料库可在以下网址公开获取。

{"title":"A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages","authors":"Mehedi Hasan Bijoy , Nahid Hossain , Salekul Islam , Swakkhar Shatabda","doi":"10.1016/j.csl.2024.101703","DOIUrl":"10.1016/j.csl.2024.101703","url":null,"abstract":"<div><p>Spelling error correction is the task of identifying and rectifying misspelled words in texts. It is a potential and active research topic in Natural Language Processing because of numerous applications in human language understanding. The phonetically or visually similar yet semantically distinct characters make it an arduous task in any language. Earlier efforts on spelling error correction in Bangla and resource-scarce Indic languages focused on rule-based, statistical, and machine learning-based methods which we found rather inefficient. In particular, machine learning-based approaches, which exhibit superior performance to rule-based and statistical methods, are ineffective as they correct each character regardless of its appropriateness. In this paper, we propose a novel detector-purificator-corrector framework, DPCSpell based on denoising transformers by addressing previous issues. In addition to that, we present a method for large-scale corpus creation from scratch which in turn resolves the resource limitation problem of any left-to-right scripted language. The empirical outcomes demonstrate the effectiveness of our approach, which outperforms previous state-of-the-art methods by attaining an exact match (EM) score of 94.78%, a precision score of 0.9487, a recall score of 0.9478, an f1 score of 0.948, an f0.5 score of 0.9483, and a modified accuracy (MA) score of 95.16% for Bangla spelling error correction. The models and corpus are publicly available at <span><span>https://tinyurl.com/DPCSpell</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101703"},"PeriodicalIF":3.1,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S088523082400086X/pdfft?md5=42e971181da3ed460a728ce6126888c9&pid=1-s2.0-S088523082400086X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modelling child comprehension: A case of suffixal passive construction in Korean 儿童理解模型：韩语中的后缀被动结构案例

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-08-05 DOI: 10.1016/j.csl.2024.101701

Gyu-Ho Shin , Seongmin Mun

The present study investigates a computational model's ability to capture monolingual children's language behaviour during comprehension in Korean, an understudied language in the field. Specifically, we test whether and how two neural network architectures (LSTM, GPT-2) cope with a suffixal passive construction involving verbal morphology and required interpretive procedures (i.e., revising the mapping between thematic roles and case markers) driven by that morphology. To this end, we fine-tune our models via patching (i.e., pre-trained model + caregiver input) and hyperparameter adjustments, and measure their binary classification performance on the test sentences used in a behavioural study manifesting scrambling and omission of sentential components to varying degrees. We find that, while these models’ performance converges with the children's response patterns found in the behavioural study to some extent, the models do not faithfully simulate the children's comprehension behaviour pertaining to the suffixal passive, yielding by-model, by-condition, and by-hyperparameter asymmetries. This points to the limits of the neural networks’ capacity to address child language features. The implications of this study invite subsequent inquiries on the extent to which computational models reveal developmental trajectories of children's linguistic knowledge that have been unveiled through corpus-based or experimental research.

本研究调查了一个计算模型捕捉单语儿童在韩语理解过程中语言行为的能力，韩语是该领域研究不足的语言。具体来说，我们测试了两种神经网络架构（LSTM、GPT-2）是否以及如何应对涉及动词词形的后缀被动结构以及由该词形驱动的所需解释程序（即修改主题角色和大小写标记之间的映射）。为此，我们通过修补（即预训练模型 + 照料者输入）和超参数调整对我们的模型进行了微调，并测量了它们在行为研究中使用的测试句子上的二元分类性能，这些测试句子在不同程度上表现出句子成分的混淆和遗漏。我们发现，虽然这些模型的表现在一定程度上与行为研究中发现的儿童反应模式趋同，但这些模型并没有忠实地模拟儿童对后缀被动句的理解行为，而是产生了因模型、条件和超参数而异的不对称性。这说明神经网络处理儿童语言特点的能力有限。本研究的意义邀请大家继续探讨计算模型在多大程度上揭示了儿童语言知识的发展轨迹，而这些轨迹是通过基于语料库的研究或实验研究揭示出来的。

{"title":"Modelling child comprehension: A case of suffixal passive construction in Korean","authors":"Gyu-Ho Shin , Seongmin Mun","doi":"10.1016/j.csl.2024.101701","DOIUrl":"10.1016/j.csl.2024.101701","url":null,"abstract":"<div><div>The present study investigates a computational model's ability to capture monolingual children's language behaviour during comprehension in Korean, an understudied language in the field. Specifically, we test whether and how two neural network architectures (LSTM, GPT-2) cope with a suffixal passive construction involving verbal morphology and required interpretive procedures (i.e., revising the mapping between thematic roles and case markers) driven by that morphology. To this end, we fine-tune our models via patching (i.e., pre-trained model + caregiver input) and hyperparameter adjustments, and measure their binary classification performance on the test sentences used in a behavioural study manifesting scrambling and omission of sentential components to varying degrees. We find that, while these models’ performance converges with the children's response patterns found in the behavioural study to some extent, the models do not faithfully simulate the children's comprehension behaviour pertaining to the suffixal passive, yielding by-model, by-condition, and by-hyperparameter asymmetries. This points to the limits of the neural networks’ capacity to address child language features. The implications of this study invite subsequent inquiries on the extent to which computational models reveal developmental trajectories of children's linguistic knowledge that have been unveiled through corpus-based or experimental research.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101701"},"PeriodicalIF":3.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge-aware audio-grounded generative slot filling for limited annotated data 针对有限注释数据的知识感知音频生成槽填充

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-08-05 DOI: 10.1016/j.csl.2024.101707

Guangzhi Sun , Chao Zhang , Ivan Vulić , Paweł Budzianowski , Philip C. Woodland

Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by (1) framing it as a text generation task, (2) grounding text generation additionally in the audio modality, and (3) conditioning on available external knowledge (e.g. a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.

为面向任务的对话（ToD）系统手动标注细粒度的槽值标签是一项既费钱又费时的工作。这就促使人们研究利用有限的标注数据进行时隙填充的方法。此外，目前有关 ToD 的大部分研究工作都是以文本作为输入模式，而忽略了在处理口语时不完善的自动语音识别（ASR）所带来的额外挑战。在这项工作中，我们提出了一个知识感知音频-地基生成式插槽填充框架（称为 KA2G），该框架专注于使用语音输入进行 ToD 的少镜头和零镜头插槽填充。KA2G 通过（1）将语音 ToD 定义为文本生成任务，（2）将文本生成额外建立在音频模态上，以及（3）以可用的外部知识（预定义的可能槽值列表）为条件，实现了基于语音的 ToD 的稳健且数据高效的槽填充。我们的研究表明，在 KA2G 框架内将两种模态结合在一起可提高抗 ASR 错误的鲁棒性。此外，KA2G 中的知识感知时隙值生成器是通过指针生成器机制实现的，尤其有利于少次学习和零次学习。在基于标准语音的单匝 SLURP 数据集和从商业 ToD 系统中提取的多匝数据集上进行的实验显示，KA2G 比之前的研究成果具有更强、更稳定的优势，尤其是在少匝和零匝设置中。

{"title":"Knowledge-aware audio-grounded generative slot filling for limited annotated data","authors":"Guangzhi Sun , Chao Zhang , Ivan Vulić , Paweł Budzianowski , Philip C. Woodland","doi":"10.1016/j.csl.2024.101707","DOIUrl":"10.1016/j.csl.2024.101707","url":null,"abstract":"<div><p>Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by (1) framing it as a text generation task, (2) grounding text generation additionally in the audio modality, and (3) conditioning on available external knowledge (<em>e.g.</em> a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101707"},"PeriodicalIF":3.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000901/pdfft?md5=f629f96f3e24fa1b58c6bf9d7f53386f&pid=1-s2.0-S0885230824000901-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Speech self-supervised representations benchmarking: A case for larger probing heads 语音自监督表征基准：更大探测头的案例

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-08-03 DOI: 10.1016/j.csl.2024.101695

Salah Zaiem , Youcef Kemiche , Titouan Parcollet , Slim Essid , Mirco Ravanelli

Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has been growing, most proposals rely upon a single downstream architecture that maps the frozen SSL representations to the task labels. This study examines how benchmarking results are affected by changes in the probing head architecture. Interestingly, we found that altering the downstream architecture structure leads to significant fluctuations in the performance ranking of the evaluated models. Against common practices in speech SSL benchmarking, we evaluate larger-capacity probing heads, showing their impact on performance, inference costs, generalization, and multi-level feature exploitation.

自监督学习（SSL）利用大量无标注语音数据集，在减少标注数据量的情况下实现令人印象深刻的性能。由于提出的方法数量众多，因此出现了一些综合基准，用于评估这些方法在一系列探索语音信号各个方面的下游任务上的性能。然而，虽然考虑的任务数量不断增加，但大多数建议都依赖于将冻结的 SSL 表示映射到任务标签的单一下游架构。本研究探讨了基准测试结果如何受到探测头架构变化的影响。有趣的是，我们发现改变下游架构结构会导致所评估模型的性能排名出现显著波动。针对语音 SSL 基准测试中的常见做法，我们评估了更大容量的探测头，显示了它们对性能、推理成本、泛化和多层次特征利用的影响。

{"title":"Speech self-supervised representations benchmarking: A case for larger probing heads","authors":"Salah Zaiem , Youcef Kemiche , Titouan Parcollet , Slim Essid , Mirco Ravanelli","doi":"10.1016/j.csl.2024.101695","DOIUrl":"10.1016/j.csl.2024.101695","url":null,"abstract":"<div><p>Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has been growing, most proposals rely upon a single downstream architecture that maps the frozen SSL representations to the task labels. This study examines how benchmarking results are affected by changes in the probing head architecture. Interestingly, we found that altering the downstream architecture structure leads to significant fluctuations in the performance ranking of the evaluated models. Against common practices in speech SSL benchmarking, we evaluate larger-capacity probing heads, showing their impact on performance, inference costs, generalization, and multi-level feature exploitation.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101695"},"PeriodicalIF":3.1,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000780/pdfft?md5=2b21a1caf20c9b6cfe8c476d74149c9f&pid=1-s2.0-S0885230824000780-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improved relation extraction through key phrase identification using community detection on dependency trees 利用依存树上的社群检测，通过关键短语识别改进关系提取

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-08-02 DOI: 10.1016/j.csl.2024.101706

Shuang Liu , Xunqin Chen , Jiana Meng , Niko Lukač

A method for extracting relations from sentences by utilizing their dependency trees to identify key phrases is presented in this paper. Dependency trees are commonly used in natural language processing to represent the grammatical structure of a sentence, and this approach builds upon this representation to extract meaningful relations between phrases. Identifying key phrases is crucial in relation extraction as they often indicate the entities and actions involved in a relation. The method uses community detection algorithms on the dependency tree to identify groups of related words that form key phrases, such as subject-verb-object structures. The experiments on the Semeval-2010 task8 dataset and the TACRED dataset demonstrate that the proposed method outperforms existing baseline methods.

本文介绍了一种通过利用句子的依存树来识别关键短语，从而从句子中提取关系的方法。在自然语言处理中，依赖树通常用来表示句子的语法结构，而这种方法就是在这种表示法的基础上提取短语之间有意义的关系。识别关键短语在关系提取中至关重要，因为它们通常表示关系中涉及的实体和行为。该方法使用依赖树上的群体检测算法来识别构成关键短语的相关词组，如主谓宾结构。在 Semeval-2010 task8 数据集和 TACRED 数据集上的实验表明，所提出的方法优于现有的基线方法。

引用次数: 0

Assessing language models’ task and language transfer capabilities for sentiment analysis in dialog data 评估语言模型在对话数据情感分析中的任务和语言转换能力

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-07-31 DOI: 10.1016/j.csl.2024.101704

Vlad-Andrei Negru, Vasile Suciu, Alex-Mihai Lăpuşan, Camelia Lemnaru, Mihaela Dînşoreanu, Rodica Potolea

Our work explores the differences between GRU-based and transformer-based approaches in the context of sentiment analysis on text dialog. In addition to the overall performance on the downstream task, we assess the knowledge transfer capabilities of the models by applying a thorough zero-shot analysis at task level, and on the cross-lingual performance between five European languages. The ability to generalize over different tasks and languages is of high importance, as the data needed for a particular application may be scarce or non existent. We perform evaluations on both known benchmark datasets and a novel synthetic dataset for dialog data, containing Romanian call-center conversations. We study the most appropriate combination of synthetic and real data for fine-tuning on the downstream task, enabling our models to perform in low-resource environments. We leverage the informative power of the conversational context, showing that appending the previous four utterances of the same speaker to the input sequence has the greatest benefit on the inference performance. The cross-lingual and cross-task evaluations have shown that the transformer-based models possess superior transfer abilities to the GRU model, especially in the zero-shot setting. Considering its prior intensive fine-tuning on multiple labeled datasets for various tasks, FLAN-T5 excels in the zero-shot task experiments, obtaining a zero-shot accuracy of 51.27% on the IEMOCAP dataset, alongside the classical BERT that obtained the highest zero-shot accuracy on the MELD dataset with 55.08%.

我们的研究探索了基于 GRU 的方法和基于转换器的方法在文本对话情感分析中的差异。除了下游任务的整体性能外，我们还通过在任务层面上应用全面的零点分析，以及五种欧洲语言之间的跨语言性能，评估了模型的知识转移能力。由于特定应用所需的数据可能稀缺或不存在，因此对不同任务和语言进行泛化的能力非常重要。我们在已知的基准数据集和包含罗马尼亚语呼叫中心对话的新型合成对话数据集上进行了评估。我们研究了合成数据和真实数据的最合适组合，以便对下游任务进行微调，使我们的模型能够在资源匮乏的环境中运行。我们充分利用了对话上下文的信息力量，结果表明，在输入序列中附加同一说话者的前四句话对推理性能有最大的好处。跨语言和跨任务评估表明，基于转换器的模型比 GRU 模型具有更强的转换能力，尤其是在零镜头环境下。考虑到 FLAN-T5 之前针对不同任务在多个标注数据集上进行的密集微调，它在零点任务实验中表现出色，在 IEMOCAP 数据集上获得了 51.27% 的零点准确率，与经典 BERT 在 MELD 数据集上获得的 55.08% 的最高零点准确率并驾齐驱。

{"title":"Assessing language models’ task and language transfer capabilities for sentiment analysis in dialog data","authors":"Vlad-Andrei Negru, Vasile Suciu, Alex-Mihai Lăpuşan, Camelia Lemnaru, Mihaela Dînşoreanu, Rodica Potolea","doi":"10.1016/j.csl.2024.101704","DOIUrl":"10.1016/j.csl.2024.101704","url":null,"abstract":"<div><p>Our work explores the differences between GRU-based and transformer-based approaches in the context of sentiment analysis on text dialog. In addition to the overall performance on the downstream task, we assess the knowledge transfer capabilities of the models by applying a thorough zero-shot analysis at task level, and on the cross-lingual performance between five European languages. The ability to generalize over different tasks and languages is of high importance, as the data needed for a particular application may be scarce or non existent. We perform evaluations on both known benchmark datasets and a novel synthetic dataset for dialog data, containing Romanian call-center conversations. We study the most appropriate combination of synthetic and real data for fine-tuning on the downstream task, enabling our models to perform in low-resource environments. We leverage the informative power of the conversational context, showing that appending the previous four utterances of the same speaker to the input sequence has the greatest benefit on the inference performance. The cross-lingual and cross-task evaluations have shown that the transformer-based models possess superior transfer abilities to the GRU model, especially in the zero-shot setting. Considering its prior intensive fine-tuning on multiple labeled datasets for various tasks, FLAN-T5 excels in the zero-shot task experiments, obtaining a zero-shot accuracy of 51.27% on the IEMOCAP dataset, alongside the classical BERT that obtained the highest zero-shot accuracy on the MELD dataset with 55.08%.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101704"},"PeriodicalIF":3.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000871/pdfft?md5=a2ab3e37131135c69cec0ed9bbef500a&pid=1-s2.0-S0885230824000871-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

COfEE: A comprehensive ontology for event extraction from text COfEE：从文本中提取事件的综合本体论

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language

Pub Date : 2024-07-31 DOI: 10.1016/j.csl.2024.101702

Ali Balali, Masoud Asadpour, Seyed Hossein Jafari

Large volumes of data are constantly being published on the web; however, the majority of this data is often unstructured, making it difficult to comprehend and interpret. To extract meaningful and structured information from such data, researchers and practitioners have turned to Information Extraction (IE) methods. One of the most challenging IE tasks is Event Extraction (EE), which involves extracting information related to specific incidents and their associated actors from text. EE has broad applications, including building a knowledge base, information retrieval, summarization, and online monitoring systems. Over the past few decades, various event ontologies, such as ACE, CAMEO, and ICEWS, have been developed to define event forms, actors, and dimensions of events observed in text. However, these ontologies have some limitations, such as covering only a few topics like political events, having inflexible structures in defining argument roles, lacking analytical dimensions, and insufficient gold-standard data. To address these concerns, we propose a new event ontology, COfEE, which integrates expert domain knowledge, previous ontologies, and a data-driven approach for identifying events from text. COfEE comprises two hierarchy levels (event types and event sub-types) that include new categories related to environmental issues, cyberspace, criminal activity, and natural disasters that require real-time monitoring. In addition, dynamic roles are defined for each event sub-type to capture various dimensions of events. The proposed ontology is evaluated on Wikipedia events, and it is shown to be comprehensive and general. Furthermore, to facilitate the preparation of gold-standard data for event extraction, we present a language-independent online tool based on COfEE. A gold-standard dataset annotated by ten human experts consisting of 24,000 news articles in Persian according to the COfEE ontology is also prepared. To diversify the data, news articles from the Wikipedia event portal and the 100 most popular Persian news agencies between 2008 and 2021 were collected. Finally, we introduce a supervised method based on deep learning techniques to automatically extract relevant events and their corresponding actors.

大量数据不断在网络上发布；然而，这些数据大多是非结构化的，因此难以理解和解释。为了从这些数据中提取有意义的结构化信息，研究人员和从业人员转向了信息提取（IE）方法。最具挑战性的信息提取任务之一是事件提取（EE），它涉及从文本中提取与特定事件及其相关人员有关的信息。EE 应用广泛，包括建立知识库、信息检索、总结和在线监控系统。过去几十年来，人们开发了各种事件本体，如 ACE、CAMEO 和 ICEWS，用于定义文本中观察到的事件的形式、参与者和维度。然而，这些本体论也有一些局限性，如仅涵盖政治事件等少数主题、在定义论点角色时结构不够灵活、缺乏分析维度以及黄金标准数据不足等。为了解决这些问题，我们提出了一个新的事件本体--COfEE，它整合了专家领域知识、以前的本体和数据驱动方法，用于从文本中识别事件。COfEE 包括两个层次（事件类型和事件子类型），其中包括与需要实时监控的环境问题、网络空间、犯罪活动和自然灾害相关的新类别。此外，还为每个事件子类型定义了动态角色，以捕捉事件的各个层面。在维基百科事件上对所提出的本体进行了评估，结果表明本体是全面和通用的。此外，为了便于准备用于事件提取的黄金标准数据，我们还提出了一种基于 COfEE 的、与语言无关的在线工具。我们还准备了一个黄金标准数据集，该数据集由十位人类专家根据 COfEE 本体注释的 24,000 篇波斯语新闻文章组成。为了使数据多样化，我们还从维基百科事件门户网站和 2008 年至 2021 年间最受欢迎的 100 家波斯语通讯社中收集了新闻文章。最后，我们引入了一种基于深度学习技术的监督方法，以自动提取相关事件及其相应的参与者。

{"title":"COfEE: A comprehensive ontology for event extraction from text","authors":"Ali Balali, Masoud Asadpour, Seyed Hossein Jafari","doi":"10.1016/j.csl.2024.101702","DOIUrl":"10.1016/j.csl.2024.101702","url":null,"abstract":"<div><p>Large volumes of data are constantly being published on the web; however, the majority of this data is often unstructured, making it difficult to comprehend and interpret. To extract meaningful and structured information from such data, researchers and practitioners have turned to Information Extraction (IE) methods. One of the most challenging IE tasks is Event Extraction (EE), which involves extracting information related to specific incidents and their associated actors from text. EE has broad applications, including building a knowledge base, information retrieval, summarization, and online monitoring systems. Over the past few decades, various event ontologies, such as ACE, CAMEO, and ICEWS, have been developed to define event forms, actors, and dimensions of events observed in text. However, these ontologies have some limitations, such as covering only a few topics like political events, having inflexible structures in defining argument roles, lacking analytical dimensions, and insufficient gold-standard data. To address these concerns, we propose a new event ontology, COfEE, which integrates expert domain knowledge, previous ontologies, and a data-driven approach for identifying events from text. COfEE comprises two hierarchy levels (event types and event sub-types) that include new categories related to environmental issues, cyberspace, criminal activity, and natural disasters that require real-time monitoring. In addition, dynamic roles are defined for each event sub-type to capture various dimensions of events. The proposed ontology is evaluated on Wikipedia events, and it is shown to be comprehensive and general. Furthermore, to facilitate the preparation of gold-standard data for event extraction, we present a language-independent online tool based on COfEE. A gold-standard dataset annotated by ten human experts consisting of 24,000 news articles in Persian according to the COfEE ontology is also prepared. To diversify the data, news articles from the Wikipedia event portal and the 100 most popular Persian news agencies between 2008 and 2021 were collected. Finally, we introduce a supervised method based on deep learning techniques to automatically extract relevant events and their corresponding actors.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101702"},"PeriodicalIF":3.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000858/pdfft?md5=edd34515a4d99328a0c8d35808aa0fe2&pid=1-s2.0-S0885230824000858-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0