Effective capturing of semantic dependencies within sentences is pivotal to support relation extraction. However, challenges such as feature sparsity, and the complexity of identifying the structure of target entity pairs brought by the traditional methods of feature extraction pose significant obstacles for relation extraction. Existing methods that rely on combined features or recurrent networks also face limitations, such as over-reliance on prior knowledge or the gradient vanishing problem. To address these limitations, we propose an Adaptive Feature Extraction (AFE) method, combining neural networks with feature engineering to capture high-order abstract and long-distance semantic dependencies. Our approach extracts atomic features from sentences, maps them into distributed representations, and categorizes these representations into multiple mixed features through adaptive combination, setting it apart from other methods. The proposed AFE-based model uses four different convolutional layers to facilitate feature learning and weighting from the adaptive feature representations, thereby enhancing the discriminative power of deep networks for relation extraction. Experimental results on the English datasets ACE05 English, SciERC, and the Chinese datasets ACE05 Chinese, and CLTC(SanWen) demonstrated the superiority of our method, the F1 scores were improved by 4.16%, 3.99%, 0.82%, and 1.60%, respectively. In summary, our AFE method provides a flexible, and effective solution to some challenges in cross-domain and cross-language relation extraction.
{"title":"Adaptive feature extraction for entity relation extraction","authors":"Weizhe Yang , Yongbin Qin , Ruizhang Huang , Yanping Chen","doi":"10.1016/j.csl.2024.101712","DOIUrl":"10.1016/j.csl.2024.101712","url":null,"abstract":"<div><p>Effective capturing of semantic dependencies within sentences is pivotal to support relation extraction. However, challenges such as feature sparsity, and the complexity of identifying the structure of target entity pairs brought by the traditional methods of feature extraction pose significant obstacles for relation extraction. Existing methods that rely on combined features or recurrent networks also face limitations, such as over-reliance on prior knowledge or the gradient vanishing problem. To address these limitations, we propose an Adaptive Feature Extraction (AFE) method, combining neural networks with feature engineering to capture high-order abstract and long-distance semantic dependencies. Our approach extracts atomic features from sentences, maps them into distributed representations, and categorizes these representations into multiple mixed features through adaptive combination, setting it apart from other methods. The proposed AFE-based model uses four different convolutional layers to facilitate feature learning and weighting from the adaptive feature representations, thereby enhancing the discriminative power of deep networks for relation extraction. Experimental results on the English datasets ACE05 English, SciERC, and the Chinese datasets ACE05 Chinese, and CLTC(SanWen) demonstrated the superiority of our method, the F1 scores were improved by 4.16%, 3.99%, 0.82%, and 1.60%, respectively. In summary, our AFE method provides a flexible, and effective solution to some challenges in cross-domain and cross-language relation extraction.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101712"},"PeriodicalIF":3.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000950/pdfft?md5=adb04036e83a59bb4a0206084d42c6c1&pid=1-s2.0-S0885230824000950-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1016/j.csl.2024.101709
Xiang Hao , Chenglin Xu , Chen Zhang , Lei Xie
When processing noisy utterances with varying frequency bandwidths using an enhancement model, the effective bandwidth of the resulting enhanced speech often remains unchanged. However, high-frequency components are crucial for perceived audio quality, underscoring the need for noise-robust bandwidth extension capabilities in speech enhancement networks. In this study, we addressed this challenge by proposing a novel network architecture and loss function based on the CAUNet, which is a state-of-the-art speech enhancement method. We introduced a multi-scale loss and implemented a coordinate embedded upsampling block to facilitate bandwidth extension while maintaining the ability of speech enhancement. Additionally, we proposed a gradient loss function to promote the neural network’s convergence, leading to significant performance improvements. Our experimental results validate these modifications and clearly demonstrate the superiority of our approach over competing methods.
{"title":"A neural network approach for speech enhancement and noise-robust bandwidth extension","authors":"Xiang Hao , Chenglin Xu , Chen Zhang , Lei Xie","doi":"10.1016/j.csl.2024.101709","DOIUrl":"10.1016/j.csl.2024.101709","url":null,"abstract":"<div><p>When processing noisy utterances with varying frequency bandwidths using an enhancement model, the effective bandwidth of the resulting enhanced speech often remains unchanged. However, high-frequency components are crucial for perceived audio quality, underscoring the need for noise-robust bandwidth extension capabilities in speech enhancement networks. In this study, we addressed this challenge by proposing a novel network architecture and loss function based on the CAUNet, which is a state-of-the-art speech enhancement method. We introduced a multi-scale loss and implemented a coordinate embedded upsampling block to facilitate bandwidth extension while maintaining the ability of speech enhancement. Additionally, we proposed a gradient loss function to promote the neural network’s convergence, leading to significant performance improvements. Our experimental results validate these modifications and clearly demonstrate the superiority of our approach over competing methods.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101709"},"PeriodicalIF":3.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000925/pdfft?md5=3c5e79967537c7a56d567c957963e01b&pid=1-s2.0-S0885230824000925-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1016/j.csl.2024.101705
Xiyuan Jia , Zongqing Mao , Zhen Zhang , Qiyun Lv , Xin Wang , Guohua Wu
Paraphrase generation is an important method for augmenting text data, which has a crucial role in Natural Language Generation (NLG). However, existing methods lack the ability to capture the semantic representation of input sentences and the syntactic structure of exemplars, which can easily lead to problems such as redundant content, semantic inaccuracies, and poor diversity. To tackle these challenges, we propose a Syntax-Controlled Paraphrase Generator (SCPG), which utilizes attention networks and VAE-based hidden variables to model the semantics of input sentences and the syntax of exemplars. In addition, in order to achieve controllability of the target paraphrase structure, we propose a method for learning semantic and syntactic representations based on multi-task learning, and successfully integrate the two through a gating mechanism. Extensive experimental results show that SCPG achieves SOTA results in terms of both semantic consistency and syntactic controllability, and is able to make a better trade-off between preserving semantics and novelty of sentence structure.
意译生成是增强文本数据的一种重要方法,在自然语言生成(NLG)中发挥着至关重要的作用。然而,现有的方法无法捕捉输入句子的语义表征和示例的句法结构,这很容易导致冗余内容、语义不准确和多样性差等问题。为了应对这些挑战,我们提出了语法控制仿句生成器(SCPG),它利用注意力网络和基于 VAE 的隐藏变量来模拟输入句子的语义和示例的语法。此外,为了实现目标转述结构的可控性,我们提出了一种基于多任务学习的语义和句法表征学习方法,并通过门控机制成功地将二者整合在一起。大量实验结果表明,SCPG 在语义一致性和句法可控性方面都达到了 SOTA 的结果,并能在保留语义和句子结构新颖性之间做出更好的权衡。
{"title":"Syntax-controlled paraphrases generation with VAE and multi-task learning","authors":"Xiyuan Jia , Zongqing Mao , Zhen Zhang , Qiyun Lv , Xin Wang , Guohua Wu","doi":"10.1016/j.csl.2024.101705","DOIUrl":"10.1016/j.csl.2024.101705","url":null,"abstract":"<div><p>Paraphrase generation is an important method for augmenting text data, which has a crucial role in Natural Language Generation (NLG). However, existing methods lack the ability to capture the semantic representation of input sentences and the syntactic structure of exemplars, which can easily lead to problems such as redundant content, semantic inaccuracies, and poor diversity. To tackle these challenges, we propose a Syntax-Controlled Paraphrase Generator (SCPG), which utilizes attention networks and VAE-based hidden variables to model the semantics of input sentences and the syntax of exemplars. In addition, in order to achieve controllability of the target paraphrase structure, we propose a method for learning semantic and syntactic representations based on multi-task learning, and successfully integrate the two through a gating mechanism. Extensive experimental results show that SCPG achieves SOTA results in terms of both semantic consistency and syntactic controllability, and is able to make a better trade-off between preserving semantics and novelty of sentence structure.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101705"},"PeriodicalIF":3.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000883/pdfft?md5=a172f9652be80ec2012b298f58353215&pid=1-s2.0-S0885230824000883-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142011679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1016/j.csl.2024.101703
Mehedi Hasan Bijoy , Nahid Hossain , Salekul Islam , Swakkhar Shatabda
Spelling error correction is the task of identifying and rectifying misspelled words in texts. It is a potential and active research topic in Natural Language Processing because of numerous applications in human language understanding. The phonetically or visually similar yet semantically distinct characters make it an arduous task in any language. Earlier efforts on spelling error correction in Bangla and resource-scarce Indic languages focused on rule-based, statistical, and machine learning-based methods which we found rather inefficient. In particular, machine learning-based approaches, which exhibit superior performance to rule-based and statistical methods, are ineffective as they correct each character regardless of its appropriateness. In this paper, we propose a novel detector-purificator-corrector framework, DPCSpell based on denoising transformers by addressing previous issues. In addition to that, we present a method for large-scale corpus creation from scratch which in turn resolves the resource limitation problem of any left-to-right scripted language. The empirical outcomes demonstrate the effectiveness of our approach, which outperforms previous state-of-the-art methods by attaining an exact match (EM) score of 94.78%, a precision score of 0.9487, a recall score of 0.9478, an f1 score of 0.948, an f0.5 score of 0.9483, and a modified accuracy (MA) score of 95.16% for Bangla spelling error correction. The models and corpus are publicly available at https://tinyurl.com/DPCSpell.
{"title":"A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages","authors":"Mehedi Hasan Bijoy , Nahid Hossain , Salekul Islam , Swakkhar Shatabda","doi":"10.1016/j.csl.2024.101703","DOIUrl":"10.1016/j.csl.2024.101703","url":null,"abstract":"<div><p>Spelling error correction is the task of identifying and rectifying misspelled words in texts. It is a potential and active research topic in Natural Language Processing because of numerous applications in human language understanding. The phonetically or visually similar yet semantically distinct characters make it an arduous task in any language. Earlier efforts on spelling error correction in Bangla and resource-scarce Indic languages focused on rule-based, statistical, and machine learning-based methods which we found rather inefficient. In particular, machine learning-based approaches, which exhibit superior performance to rule-based and statistical methods, are ineffective as they correct each character regardless of its appropriateness. In this paper, we propose a novel detector-purificator-corrector framework, DPCSpell based on denoising transformers by addressing previous issues. In addition to that, we present a method for large-scale corpus creation from scratch which in turn resolves the resource limitation problem of any left-to-right scripted language. The empirical outcomes demonstrate the effectiveness of our approach, which outperforms previous state-of-the-art methods by attaining an exact match (EM) score of 94.78%, a precision score of 0.9487, a recall score of 0.9478, an f1 score of 0.948, an f0.5 score of 0.9483, and a modified accuracy (MA) score of 95.16% for Bangla spelling error correction. The models and corpus are publicly available at <span><span>https://tinyurl.com/DPCSpell</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101703"},"PeriodicalIF":3.1,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S088523082400086X/pdfft?md5=42e971181da3ed460a728ce6126888c9&pid=1-s2.0-S088523082400086X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.csl.2024.101701
Gyu-Ho Shin , Seongmin Mun
The present study investigates a computational model's ability to capture monolingual children's language behaviour during comprehension in Korean, an understudied language in the field. Specifically, we test whether and how two neural network architectures (LSTM, GPT-2) cope with a suffixal passive construction involving verbal morphology and required interpretive procedures (i.e., revising the mapping between thematic roles and case markers) driven by that morphology. To this end, we fine-tune our models via patching (i.e., pre-trained model + caregiver input) and hyperparameter adjustments, and measure their binary classification performance on the test sentences used in a behavioural study manifesting scrambling and omission of sentential components to varying degrees. We find that, while these models’ performance converges with the children's response patterns found in the behavioural study to some extent, the models do not faithfully simulate the children's comprehension behaviour pertaining to the suffixal passive, yielding by-model, by-condition, and by-hyperparameter asymmetries. This points to the limits of the neural networks’ capacity to address child language features. The implications of this study invite subsequent inquiries on the extent to which computational models reveal developmental trajectories of children's linguistic knowledge that have been unveiled through corpus-based or experimental research.
{"title":"Modelling child comprehension: A case of suffixal passive construction in Korean","authors":"Gyu-Ho Shin , Seongmin Mun","doi":"10.1016/j.csl.2024.101701","DOIUrl":"10.1016/j.csl.2024.101701","url":null,"abstract":"<div><div>The present study investigates a computational model's ability to capture monolingual children's language behaviour during comprehension in Korean, an understudied language in the field. Specifically, we test whether and how two neural network architectures (LSTM, GPT-2) cope with a suffixal passive construction involving verbal morphology and required interpretive procedures (i.e., revising the mapping between thematic roles and case markers) driven by that morphology. To this end, we fine-tune our models via patching (i.e., pre-trained model + caregiver input) and hyperparameter adjustments, and measure their binary classification performance on the test sentences used in a behavioural study manifesting scrambling and omission of sentential components to varying degrees. We find that, while these models’ performance converges with the children's response patterns found in the behavioural study to some extent, the models do not faithfully simulate the children's comprehension behaviour pertaining to the suffixal passive, yielding by-model, by-condition, and by-hyperparameter asymmetries. This points to the limits of the neural networks’ capacity to address child language features. The implications of this study invite subsequent inquiries on the extent to which computational models reveal developmental trajectories of children's linguistic knowledge that have been unveiled through corpus-based or experimental research.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101701"},"PeriodicalIF":3.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.csl.2024.101707
Guangzhi Sun , Chao Zhang , Ivan Vulić , Paweł Budzianowski , Philip C. Woodland
Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by (1) framing it as a text generation task, (2) grounding text generation additionally in the audio modality, and (3) conditioning on available external knowledge (e.g. a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.
为面向任务的对话(ToD)系统手动标注细粒度的槽值标签是一项既费钱又费时的工作。这就促使人们研究利用有限的标注数据进行时隙填充的方法。此外,目前有关 ToD 的大部分研究工作都是以文本作为输入模式,而忽略了在处理口语时不完善的自动语音识别(ASR)所带来的额外挑战。在这项工作中,我们提出了一个知识感知音频-地基生成式插槽填充框架(称为 KA2G),该框架专注于使用语音输入进行 ToD 的少镜头和零镜头插槽填充。KA2G 通过(1)将语音 ToD 定义为文本生成任务,(2)将文本生成额外建立在音频模态上,以及(3)以可用的外部知识(预定义的可能槽值列表)为条件,实现了基于语音的 ToD 的稳健且数据高效的槽填充。我们的研究表明,在 KA2G 框架内将两种模态结合在一起可提高抗 ASR 错误的鲁棒性。此外,KA2G 中的知识感知时隙值生成器是通过指针生成器机制实现的,尤其有利于少次学习和零次学习。在基于标准语音的单匝 SLURP 数据集和从商业 ToD 系统中提取的多匝数据集上进行的实验显示,KA2G 比之前的研究成果具有更强、更稳定的优势,尤其是在少匝和零匝设置中。
{"title":"Knowledge-aware audio-grounded generative slot filling for limited annotated data","authors":"Guangzhi Sun , Chao Zhang , Ivan Vulić , Paweł Budzianowski , Philip C. Woodland","doi":"10.1016/j.csl.2024.101707","DOIUrl":"10.1016/j.csl.2024.101707","url":null,"abstract":"<div><p>Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by (1) framing it as a text generation task, (2) grounding text generation additionally in the audio modality, and (3) conditioning on available external knowledge (<em>e.g.</em> a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101707"},"PeriodicalIF":3.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000901/pdfft?md5=f629f96f3e24fa1b58c6bf9d7f53386f&pid=1-s2.0-S0885230824000901-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has been growing, most proposals rely upon a single downstream architecture that maps the frozen SSL representations to the task labels. This study examines how benchmarking results are affected by changes in the probing head architecture. Interestingly, we found that altering the downstream architecture structure leads to significant fluctuations in the performance ranking of the evaluated models. Against common practices in speech SSL benchmarking, we evaluate larger-capacity probing heads, showing their impact on performance, inference costs, generalization, and multi-level feature exploitation.
{"title":"Speech self-supervised representations benchmarking: A case for larger probing heads","authors":"Salah Zaiem , Youcef Kemiche , Titouan Parcollet , Slim Essid , Mirco Ravanelli","doi":"10.1016/j.csl.2024.101695","DOIUrl":"10.1016/j.csl.2024.101695","url":null,"abstract":"<div><p>Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has been growing, most proposals rely upon a single downstream architecture that maps the frozen SSL representations to the task labels. This study examines how benchmarking results are affected by changes in the probing head architecture. Interestingly, we found that altering the downstream architecture structure leads to significant fluctuations in the performance ranking of the evaluated models. Against common practices in speech SSL benchmarking, we evaluate larger-capacity probing heads, showing their impact on performance, inference costs, generalization, and multi-level feature exploitation.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101695"},"PeriodicalIF":3.1,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000780/pdfft?md5=2b21a1caf20c9b6cfe8c476d74149c9f&pid=1-s2.0-S0885230824000780-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1016/j.csl.2024.101706
Shuang Liu , Xunqin Chen , Jiana Meng , Niko Lukač
A method for extracting relations from sentences by utilizing their dependency trees to identify key phrases is presented in this paper. Dependency trees are commonly used in natural language processing to represent the grammatical structure of a sentence, and this approach builds upon this representation to extract meaningful relations between phrases. Identifying key phrases is crucial in relation extraction as they often indicate the entities and actions involved in a relation. The method uses community detection algorithms on the dependency tree to identify groups of related words that form key phrases, such as subject-verb-object structures. The experiments on the Semeval-2010 task8 dataset and the TACRED dataset demonstrate that the proposed method outperforms existing baseline methods.
{"title":"Improved relation extraction through key phrase identification using community detection on dependency trees","authors":"Shuang Liu , Xunqin Chen , Jiana Meng , Niko Lukač","doi":"10.1016/j.csl.2024.101706","DOIUrl":"10.1016/j.csl.2024.101706","url":null,"abstract":"<div><p>A method for extracting relations from sentences by utilizing their dependency trees to identify key phrases is presented in this paper. Dependency trees are commonly used in natural language processing to represent the grammatical structure of a sentence, and this approach builds upon this representation to extract meaningful relations between phrases. Identifying key phrases is crucial in relation extraction as they often indicate the entities and actions involved in a relation. The method uses community detection algorithms on the dependency tree to identify groups of related words that form key phrases, such as subject-verb-object structures. The experiments on the Semeval-2010 task8 dataset and the TACRED dataset demonstrate that the proposed method outperforms existing baseline methods.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101706"},"PeriodicalIF":3.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000895/pdfft?md5=b0ec7e5572384747887044b09fab856d&pid=1-s2.0-S0885230824000895-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Our work explores the differences between GRU-based and transformer-based approaches in the context of sentiment analysis on text dialog. In addition to the overall performance on the downstream task, we assess the knowledge transfer capabilities of the models by applying a thorough zero-shot analysis at task level, and on the cross-lingual performance between five European languages. The ability to generalize over different tasks and languages is of high importance, as the data needed for a particular application may be scarce or non existent. We perform evaluations on both known benchmark datasets and a novel synthetic dataset for dialog data, containing Romanian call-center conversations. We study the most appropriate combination of synthetic and real data for fine-tuning on the downstream task, enabling our models to perform in low-resource environments. We leverage the informative power of the conversational context, showing that appending the previous four utterances of the same speaker to the input sequence has the greatest benefit on the inference performance. The cross-lingual and cross-task evaluations have shown that the transformer-based models possess superior transfer abilities to the GRU model, especially in the zero-shot setting. Considering its prior intensive fine-tuning on multiple labeled datasets for various tasks, FLAN-T5 excels in the zero-shot task experiments, obtaining a zero-shot accuracy of 51.27% on the IEMOCAP dataset, alongside the classical BERT that obtained the highest zero-shot accuracy on the MELD dataset with 55.08%.
{"title":"Assessing language models’ task and language transfer capabilities for sentiment analysis in dialog data","authors":"Vlad-Andrei Negru, Vasile Suciu, Alex-Mihai Lăpuşan, Camelia Lemnaru, Mihaela Dînşoreanu, Rodica Potolea","doi":"10.1016/j.csl.2024.101704","DOIUrl":"10.1016/j.csl.2024.101704","url":null,"abstract":"<div><p>Our work explores the differences between GRU-based and transformer-based approaches in the context of sentiment analysis on text dialog. In addition to the overall performance on the downstream task, we assess the knowledge transfer capabilities of the models by applying a thorough zero-shot analysis at task level, and on the cross-lingual performance between five European languages. The ability to generalize over different tasks and languages is of high importance, as the data needed for a particular application may be scarce or non existent. We perform evaluations on both known benchmark datasets and a novel synthetic dataset for dialog data, containing Romanian call-center conversations. We study the most appropriate combination of synthetic and real data for fine-tuning on the downstream task, enabling our models to perform in low-resource environments. We leverage the informative power of the conversational context, showing that appending the previous four utterances of the same speaker to the input sequence has the greatest benefit on the inference performance. The cross-lingual and cross-task evaluations have shown that the transformer-based models possess superior transfer abilities to the GRU model, especially in the zero-shot setting. Considering its prior intensive fine-tuning on multiple labeled datasets for various tasks, FLAN-T5 excels in the zero-shot task experiments, obtaining a zero-shot accuracy of 51.27% on the IEMOCAP dataset, alongside the classical BERT that obtained the highest zero-shot accuracy on the MELD dataset with 55.08%.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101704"},"PeriodicalIF":3.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000871/pdfft?md5=a2ab3e37131135c69cec0ed9bbef500a&pid=1-s2.0-S0885230824000871-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-31DOI: 10.1016/j.csl.2024.101702
Ali Balali, Masoud Asadpour, Seyed Hossein Jafari
Large volumes of data are constantly being published on the web; however, the majority of this data is often unstructured, making it difficult to comprehend and interpret. To extract meaningful and structured information from such data, researchers and practitioners have turned to Information Extraction (IE) methods. One of the most challenging IE tasks is Event Extraction (EE), which involves extracting information related to specific incidents and their associated actors from text. EE has broad applications, including building a knowledge base, information retrieval, summarization, and online monitoring systems. Over the past few decades, various event ontologies, such as ACE, CAMEO, and ICEWS, have been developed to define event forms, actors, and dimensions of events observed in text. However, these ontologies have some limitations, such as covering only a few topics like political events, having inflexible structures in defining argument roles, lacking analytical dimensions, and insufficient gold-standard data. To address these concerns, we propose a new event ontology, COfEE, which integrates expert domain knowledge, previous ontologies, and a data-driven approach for identifying events from text. COfEE comprises two hierarchy levels (event types and event sub-types) that include new categories related to environmental issues, cyberspace, criminal activity, and natural disasters that require real-time monitoring. In addition, dynamic roles are defined for each event sub-type to capture various dimensions of events. The proposed ontology is evaluated on Wikipedia events, and it is shown to be comprehensive and general. Furthermore, to facilitate the preparation of gold-standard data for event extraction, we present a language-independent online tool based on COfEE. A gold-standard dataset annotated by ten human experts consisting of 24,000 news articles in Persian according to the COfEE ontology is also prepared. To diversify the data, news articles from the Wikipedia event portal and the 100 most popular Persian news agencies between 2008 and 2021 were collected. Finally, we introduce a supervised method based on deep learning techniques to automatically extract relevant events and their corresponding actors.
{"title":"COfEE: A comprehensive ontology for event extraction from text","authors":"Ali Balali, Masoud Asadpour, Seyed Hossein Jafari","doi":"10.1016/j.csl.2024.101702","DOIUrl":"10.1016/j.csl.2024.101702","url":null,"abstract":"<div><p>Large volumes of data are constantly being published on the web; however, the majority of this data is often unstructured, making it difficult to comprehend and interpret. To extract meaningful and structured information from such data, researchers and practitioners have turned to Information Extraction (IE) methods. One of the most challenging IE tasks is Event Extraction (EE), which involves extracting information related to specific incidents and their associated actors from text. EE has broad applications, including building a knowledge base, information retrieval, summarization, and online monitoring systems. Over the past few decades, various event ontologies, such as ACE, CAMEO, and ICEWS, have been developed to define event forms, actors, and dimensions of events observed in text. However, these ontologies have some limitations, such as covering only a few topics like political events, having inflexible structures in defining argument roles, lacking analytical dimensions, and insufficient gold-standard data. To address these concerns, we propose a new event ontology, COfEE, which integrates expert domain knowledge, previous ontologies, and a data-driven approach for identifying events from text. COfEE comprises two hierarchy levels (event types and event sub-types) that include new categories related to environmental issues, cyberspace, criminal activity, and natural disasters that require real-time monitoring. In addition, dynamic roles are defined for each event sub-type to capture various dimensions of events. The proposed ontology is evaluated on Wikipedia events, and it is shown to be comprehensive and general. Furthermore, to facilitate the preparation of gold-standard data for event extraction, we present a language-independent online tool based on COfEE. A gold-standard dataset annotated by ten human experts consisting of 24,000 news articles in Persian according to the COfEE ontology is also prepared. To diversify the data, news articles from the Wikipedia event portal and the 100 most popular Persian news agencies between 2008 and 2021 were collected. Finally, we introduce a supervised method based on deep learning techniques to automatically extract relevant events and their corresponding actors.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101702"},"PeriodicalIF":3.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000858/pdfft?md5=edd34515a4d99328a0c8d35808aa0fe2&pid=1-s2.0-S0885230824000858-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}