Text augmentation for semantic frame induction and parsing

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Language Resources and Evaluation Pub Date : 2023-10-21 DOI:10.1007/s10579-023-09679-8

Saba Anwar, Artem Shelmanov, Nikolay Arefyev, Alexander Panchenko, Chris Biemann

{"title":"Text augmentation for semantic frame induction and parsing","authors":"Saba Anwar, Artem Shelmanov, Nikolay Arefyev, Alexander Panchenko, Chris Biemann","doi":"10.1007/s10579-023-09679-8","DOIUrl":null,"url":null,"abstract":"Abstract Semantic frames are formal structures describing situations, actions or events, e.g., Commerce buy , Kidnapping , or Exchange . Each frame provides a set of frame elements or semantic roles corresponding to participants of the situation and lexical units (LUs)—words and phrases that can evoke this particular frame in texts. For example, for the frame Kidnapping , two key roles are Perpetrator and the Victim , and this frame can be evoked with lexical units abduct, kidnap , or snatcher . While formally sound, the scarce availability of semantic frame resources and their limited lexical coverage hinders the wider adoption of frame semantics across languages and domains. To tackle this problem, firstly, we propose a method that takes as input a few frame-annotated sentences and generates alternative lexical realizations of lexical units and semantic roles matching the original frame definition. Secondly, we show that the obtained synthetically generated semantic frame annotated examples help to improve the quality of frame-semantic parsing. To evaluate our proposed approach, we decompose our work into two parts. In the first part of text augmentation for LUs and roles, we experiment with various types of models such as distributional thesauri, non-contextualized word embeddings (word2vec, fastText, GloVe), and Transformer-based contextualized models, such as BERT or XLNet. We perform the intrinsic evaluation of these induced lexical substitutes using FrameNet gold annotations. Models based on Transformers show overall superior performance, however, they do not always outperform simpler models (based on static embeddings) unless information about the target word is suitably injected. However, we observe that non-contextualized models also show comparable performance on the task of LU expansion. We also show that combining substitutes of individual models can significantly improve the quality of final substitutes. Because intrinsic evaluation scores are highly dependent on the gold dataset and the frame preservation, and cannot be ensured by an automatic evaluation mechanism because of the incompleteness of gold datasets, we also carried out experiments with manual evaluation on sample datasets to further analyze the usefulness of our approach. The results show that the manual evaluation framework significantly outperforms automatic evaluation for lexical substitution. For extrinsic evaluation, the second part of this work assesses the utility of these lexical substitutes for the improvement of frame-semantic parsing. We took a small set of frame-annotated sentences and augmented them by replacing corresponding target words with their closest substitutes, obtained from best-performing models. Our extensive experiments on the original and augmented set of annotations with two semantic parsers show that our method is effective for improving the downstream parsing task by training set augmentation, as well as for quickly building FrameNet-like resources for new languages or subject domains.","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"51 11","pages":"0"},"PeriodicalIF":1.7000,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Resources and Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10579-023-09679-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Semantic frames are formal structures describing situations, actions or events, e.g., Commerce buy , Kidnapping , or Exchange . Each frame provides a set of frame elements or semantic roles corresponding to participants of the situation and lexical units (LUs)—words and phrases that can evoke this particular frame in texts. For example, for the frame Kidnapping , two key roles are Perpetrator and the Victim , and this frame can be evoked with lexical units abduct, kidnap , or snatcher . While formally sound, the scarce availability of semantic frame resources and their limited lexical coverage hinders the wider adoption of frame semantics across languages and domains. To tackle this problem, firstly, we propose a method that takes as input a few frame-annotated sentences and generates alternative lexical realizations of lexical units and semantic roles matching the original frame definition. Secondly, we show that the obtained synthetically generated semantic frame annotated examples help to improve the quality of frame-semantic parsing. To evaluate our proposed approach, we decompose our work into two parts. In the first part of text augmentation for LUs and roles, we experiment with various types of models such as distributional thesauri, non-contextualized word embeddings (word2vec, fastText, GloVe), and Transformer-based contextualized models, such as BERT or XLNet. We perform the intrinsic evaluation of these induced lexical substitutes using FrameNet gold annotations. Models based on Transformers show overall superior performance, however, they do not always outperform simpler models (based on static embeddings) unless information about the target word is suitably injected. However, we observe that non-contextualized models also show comparable performance on the task of LU expansion. We also show that combining substitutes of individual models can significantly improve the quality of final substitutes. Because intrinsic evaluation scores are highly dependent on the gold dataset and the frame preservation, and cannot be ensured by an automatic evaluation mechanism because of the incompleteness of gold datasets, we also carried out experiments with manual evaluation on sample datasets to further analyze the usefulness of our approach. The results show that the manual evaluation framework significantly outperforms automatic evaluation for lexical substitution. For extrinsic evaluation, the second part of this work assesses the utility of these lexical substitutes for the improvement of frame-semantic parsing. We took a small set of frame-annotated sentences and augmented them by replacing corresponding target words with their closest substitutes, obtained from best-performing models. Our extensive experiments on the original and augmented set of annotations with two semantic parsers show that our method is effective for improving the downstream parsing task by training set augmentation, as well as for quickly building FrameNet-like resources for new languages or subject domains.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于语义框架归纳和解析的文本增强

语义框架是描述情景、动作或事件的正式结构，例如，商务购买、绑架或交换。每个框架都提供了一组框架元素或语义角色，这些元素或语义角色对应于情景和词汇单位(lu)的参与者——在文本中可以唤起这一特定框架的单词和短语。例如，在绑架框架中，两个关键角色是加害者和受害者，这个框架可以用词汇单位诱拐、绑架或抢夺者来唤起。尽管语义框架资源在形式上是合理的，但其缺乏可用性及其有限的词汇覆盖范围阻碍了跨语言和领域更广泛地采用框架语义。为了解决这个问题，首先，我们提出了一种方法，该方法将一些带有框架注释的句子作为输入，并生成与原始框架定义匹配的词汇单位和语义角色的备选词汇实现。其次，我们证明了所获得的综合生成的语义框架注释示例有助于提高框架语义解析的质量。为了评估我们提出的方法，我们将工作分解为两个部分。在逻辑单元和角色的文本增强的第一部分中，我们试验了各种类型的模型，如分布式同义词库、非上下文化的词嵌入(word2vec、fastText、GloVe)和基于transformer的上下文化模型，如BERT或XLNet。我们使用框架金注释对这些诱导的词汇替代进行内在评估。然而，基于transformer的模型总体上表现出优越的性能，除非适当地注入了有关目标单词的信息，否则它们并不总是优于更简单的模型(基于静态嵌入)。然而，我们观察到非上下文化模型在逻辑单元扩展任务上也表现出相当的性能。我们还发现，将单个模型的替代组合起来可以显著提高最终替代的质量。由于内在评价分数高度依赖于黄金数据集和框架保存，并且由于黄金数据集的不完整性，无法通过自动评价机制来保证内在评价分数，我们还在样本数据集上进行了人工评价实验，以进一步分析我们方法的有效性。结果表明，人工评价框架在词汇替换方面明显优于自动评价框架。对于外部评价，本工作的第二部分评估了这些词汇替代品对改进框架语义解析的效用。我们取了一小组带框架注释的句子，并通过用从性能最好的模型中获得的最接近的替代品替换相应的目标词来增强它们。我们用两个语义解析器对原始和增强的注释集进行了广泛的实验，结果表明我们的方法可以有效地通过训练集增强来改进下游解析任务，并且可以快速地为新语言或主题领域构建类似框架的资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Language Resources and Evaluation 工程技术-计算机：跨学科应用

CiteScore

6.50

自引率

3.70%

发文量

审稿时长

>12 weeks

期刊介绍： Language Resources and Evaluation is the first publication devoted to the acquisition, creation, annotation, and use of language resources, together with methods for evaluation of resources, technologies, and applications. Language resources include language data and descriptions in machine readable form used to assist and augment language processing applications, such as written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases, etc., as well as basic software tools for their acquisition, preparation, annotation, management, customization, and use. Evaluation of language resources concerns assessing the state-of-the-art for a given technology, comparing different approaches to a given problem, assessing the availability of resources and technologies for a given application, benchmarking, and assessing system usability and user satisfaction.