Natural Language Engineering最新文献_第7页

A benchmark for evaluating Arabic word embedding models 评估阿拉伯语单词嵌入模型的基准

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering

Pub Date : 2022-10-17 DOI: 10.1017/S1351324922000444

S. Yagi, A. Elnagar, Shehdeh Fareh

Abstract Modelling the distributional semantics of such a morphologically rich language as Arabic needs to take into account its introflexive, fusional, and inflectional nature attributes that make up its combinatorial sequences and substitutional paradigms. To evaluate such word distributional models, the benchmarks that have been used thus far in Arabic have mimicked those in English. This paper reports on a benchmark that we designed to reflect linguistic patterns in both Contemporary Arabic and Classical Arabic, the first being a cover term for written and spoken Modern Standard Arabic, while the second for pre-modern Arabic. The analogy items we included in this benchmark are chosen in a transparent manner such that they would capture the major features of nouns and verbs; derivational and inflectional morphology; high-, middle-, and low-frequency patterns and lexical items; and morphosemantic, morphosyntactic, and semantic dimensions of the language. All categories included in this benchmark are carefully selected to ensure proper representation of the language. The benchmark consists of 45 roots of the trilateral, all-consonantal, and semivowel-inclusive types; six morphosemantic patterns (’af‘ala; ifta‘ala; infa‘ala; istaf‘ala; tafa‘‘ala; and tafā‘ala); five derivations (the verbal noun, active participle, and the contrasts in Masculine-Feminine; Feminine-Singular-Plural; Masculine-Singular-Plural); and morphosyntactic transformations (perfect and imperfect verbs conjugated for all pronouns); and lexical semantics (synonyms, antonyms, and hyponyms of nouns, verbs, and adjectives), as well as capital cities and currencies. All categories include an equal proportion of high-, medium-, and low-frequency items. For the purpose of validating the proposed benchmark, we developed a set of embedding models from different textual sources. Then, we tested them intrinsically using the proposed benchmark and extrinsically using two natural language processing tasks: Arabic Named Entity Recognition and Text Classification. The evaluation leads to the conclusion that the proposed benchmark is truly reflective of this morphologically rich language and discriminatory of word embeddings.

摘要对阿拉伯语这样一种形态丰富的语言的分布语义进行建模需要考虑其内屈折、融合和屈折的性质属性，这些属性构成了其组合序列和替代范式。为了评估这样的单词分布模型，迄今为止在阿拉伯语中使用的基准模仿了英语中的基准。本文报告了我们设计的一个基准，该基准旨在反映当代阿拉伯语和古典阿拉伯语的语言模式，第一个是现代标准阿拉伯语的书面和口语覆盖词，而第二个是前现代阿拉伯语的覆盖词。我们在这个基准中包含的类比项目是以透明的方式选择的，这样它们就能捕捉到名词和动词的主要特征；派生形态和屈折形态；高、中、低频模式和词汇项目；以及语言的形态语义、形态句法和语义维度。本基准中包含的所有类别都经过仔细选择，以确保语言的正确表达。基准包括45个三边、全辅音和半元音的词根；六种形态语义模式（'af'ala；ifta'ala；infa'ala；istaf'ala、tafa'ala和tafā'ala）；五个派生词（动词名词、主动分词和男性-女性、女性单数复数、男性单数复数中的对比）；形态句法转换（所有代词的完全动词和不完全动词共轭）；词汇语义（名词、动词和形容词的同义词、反义词和上义词），以及首都和货币。所有类别都包括相同比例的高、中、低频项目。为了验证所提出的基准，我们从不同的文本来源开发了一组嵌入模型。然后，我们使用所提出的基准对它们进行了内在测试，并使用两个自然语言处理任务进行了外在测试：阿拉伯语命名实体识别和文本分类。评估得出的结论是，所提出的基准确实反映了这种形态丰富的语言和单词嵌入的歧视性。

{"title":"A benchmark for evaluating Arabic word embedding models","authors":"S. Yagi, A. Elnagar, Shehdeh Fareh","doi":"10.1017/S1351324922000444","DOIUrl":"https://doi.org/10.1017/S1351324922000444","url":null,"abstract":"Abstract Modelling the distributional semantics of such a morphologically rich language as Arabic needs to take into account its introflexive, fusional, and inflectional nature attributes that make up its combinatorial sequences and substitutional paradigms. To evaluate such word distributional models, the benchmarks that have been used thus far in Arabic have mimicked those in English. This paper reports on a benchmark that we designed to reflect linguistic patterns in both Contemporary Arabic and Classical Arabic, the first being a cover term for written and spoken Modern Standard Arabic, while the second for pre-modern Arabic. The analogy items we included in this benchmark are chosen in a transparent manner such that they would capture the major features of nouns and verbs; derivational and inflectional morphology; high-, middle-, and low-frequency patterns and lexical items; and morphosemantic, morphosyntactic, and semantic dimensions of the language. All categories included in this benchmark are carefully selected to ensure proper representation of the language. The benchmark consists of 45 roots of the trilateral, all-consonantal, and semivowel-inclusive types; six morphosemantic patterns (’af‘ala; ifta‘ala; infa‘ala; istaf‘ala; tafa‘‘ala; and tafā‘ala); five derivations (the verbal noun, active participle, and the contrasts in Masculine-Feminine; Feminine-Singular-Plural; Masculine-Singular-Plural); and morphosyntactic transformations (perfect and imperfect verbs conjugated for all pronouns); and lexical semantics (synonyms, antonyms, and hyponyms of nouns, verbs, and adjectives), as well as capital cities and currencies. All categories include an equal proportion of high-, medium-, and low-frequency items. For the purpose of validating the proposed benchmark, we developed a set of embedding models from different textual sources. Then, we tested them intrinsically using the proposed benchmark and extrinsically using two natural language processing tasks: Arabic Named Entity Recognition and Text Classification. The evaluation leads to the conclusion that the proposed benchmark is truly reflective of this morphologically rich language and discriminatory of word embeddings.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"29 1","pages":"978 - 1003"},"PeriodicalIF":2.5,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41365337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A randomized prospective study of a hybrid rule- and data-driven virtual patient 一项基于规则和数据驱动的混合虚拟患者的随机前瞻性研究

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering

Pub Date : 2022-09-23 DOI: 10.1017/s1351324922000420

Adam Stiff, Michael White, E. Fosler-Lussier, Lifeng Jin, Evan Jaffe, D. Danforth

Randomized prospective studies represent the gold standard for experimental design. In this paper, we present a randomized prospective study to validate the benefits of combining rule-based and data-driven natural language understanding methods in a virtual patient dialogue system. The system uses a rule-based pattern matching approach together with a machine learning (ML) approach in the form of a text-based convolutional neural network, combining the two methods with a simple logistic regression model to choose between their predictions for each dialogue turn. In an earlier, retrospective study, the hybrid system yielded a nearly 50% error reduction on our initial data, in part due to the differential performance between the two methods as a function of label frequency. Given these gains, and considering that our hybrid approach is unique among virtual patient systems, we compare the hybrid system to the rule-based system by itself in a randomized prospective study. We evaluate 110 unique medical student subjects interacting with the system over 5,296 conversation turns, to verify whether similar gains are observed in a deployed system. This prospective study broadly confirms the findings from the earlier one but also highlights important deficits in our training data. The hybrid approach still improves over either rule-based or ML approaches individually, even handling unseen classes with some success. However, we observe that live subjects ask more out-of-scope questions than expected. To better handle such questions, we investigate several modifications to the system combination component. These show significant overall accuracy improvements and modest F1 improvements on out-of-scope queries in an offline evaluation. We provide further analysis to characterize the difficulty of the out-of-scope problem that we have identified, as well as to suggest future improvements over the baseline we establish here.

随机前瞻性研究是实验设计的黄金标准。在本文中，我们提出了一项随机前瞻性研究，以验证在虚拟患者对话系统中结合基于规则和数据驱动的自然语言理解方法的好处。该系统使用基于规则的模式匹配方法和基于文本的卷积神经网络形式的机器学习（ML）方法，将这两种方法与简单的逻辑回归模型相结合，在它们对每个对话回合的预测之间进行选择。在早期的回顾性研究中，混合系统在我们的初始数据上减少了近50%的误差，部分原因是两种方法之间的性能与标签频率有关。鉴于这些优势，并考虑到我们的混合方法在虚拟患者系统中是独一无二的，我们在一项随机前瞻性研究中将混合系统与基于规则的系统本身进行了比较。我们评估了110名独特的医学生受试者在5296次对话中与系统互动，以验证在部署的系统中是否观察到类似的收获。这项前瞻性研究广泛地证实了早期研究的发现，但也强调了我们训练数据中的重要缺陷。混合方法仍然比基于规则或ML方法单独改进，甚至在处理看不见的类时也取得了一些成功。然而，我们观察到，现场受试者提出的超出范围的问题比预期的要多。为了更好地处理这些问题，我们研究了对系统组合组件的几种修改。在离线评估中，这些显示了显著的总体准确性改进和对范围外查询的适度F1改进。我们提供了进一步的分析，以描述我们已经确定的范围外问题的困难，并建议在我们这里建立的基线基础上进行未来的改进。

{"title":"A randomized prospective study of a hybrid rule- and data-driven virtual patient","authors":"Adam Stiff, Michael White, E. Fosler-Lussier, Lifeng Jin, Evan Jaffe, D. Danforth","doi":"10.1017/s1351324922000420","DOIUrl":"https://doi.org/10.1017/s1351324922000420","url":null,"abstract":"\u0000 Randomized prospective studies represent the gold standard for experimental design. In this paper, we present a randomized prospective study to validate the benefits of combining rule-based and data-driven natural language understanding methods in a virtual patient dialogue system. The system uses a rule-based pattern matching approach together with a machine learning (ML) approach in the form of a text-based convolutional neural network, combining the two methods with a simple logistic regression model to choose between their predictions for each dialogue turn. In an earlier, retrospective study, the hybrid system yielded a nearly 50% error reduction on our initial data, in part due to the differential performance between the two methods as a function of label frequency. Given these gains, and considering that our hybrid approach is unique among virtual patient systems, we compare the hybrid system to the rule-based system by itself in a randomized prospective study. We evaluate 110 unique medical student subjects interacting with the system over 5,296 conversation turns, to verify whether similar gains are observed in a deployed system. This prospective study broadly confirms the findings from the earlier one but also highlights important deficits in our training data. The hybrid approach still improves over either rule-based or ML approaches individually, even handling unseen classes with some success. However, we observe that live subjects ask more out-of-scope questions than expected. To better handle such questions, we investigate several modifications to the system combination component. These show significant overall accuracy improvements and modest F1 improvements on out-of-scope queries in an offline evaluation. We provide further analysis to characterize the difficulty of the out-of-scope problem that we have identified, as well as to suggest future improvements over the baseline we establish here.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47119709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Improving aspect-based neural sentiment classification with lexicon enhancement, attention regularization and sentiment induction 用词汇增强、注意正则化和情感诱导改进基于方面的神经情感分类

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering

Pub Date : 2022-09-12 DOI: 10.1017/s1351324922000432

Lingxian Bao, Patrik Lambert, Toni Badia

Deep neural networks as an end-to-end approach lack robustness from an application point of view, as it is very difficult to fix an obvious problem without retraining the model, for example, when a model consistently predicts positive when seeing the word “terrible.” Meanwhile, it is less stressed that the commonly used attention mechanism is likely to “over-fit” by being overly sparse, so that some key positions in the input sequence could be overlooked by the network. To address these problems, we proposed a lexicon-enhanced attention LSTM model in 2019, named ATLX. In this paper, we describe extended experiments and analysis of the ATLX model. And, we also try to further improve the aspect-based sentiment analysis system by combining a vector-based sentiment domain adaptation method.

从应用的角度来看，深度神经网络作为一种端到端的方法缺乏稳健性，因为在不重新训练模型的情况下很难解决明显的问题，例如，当一个模型在看到“可怕”一词时总是预测积极的时候，使得网络可以忽略输入序列中的一些关键位置。为了解决这些问题，我们在2019年提出了一个词汇增强注意力LSTM模型，名为ATLX。在本文中，我们描述了ATLX模型的扩展实验和分析。并且，我们还试图通过结合基于向量的情感域自适应方法来进一步改进基于方面的情感分析系统。

引用次数: 0

Leveraging machine translation for cross-lingual fine-grained cyberbullying classification amongst pre-adolescents 利用机器翻译对学龄前青少年进行跨语言细粒度网络欺凌分类

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering

Pub Date : 2022-09-07 DOI: 10.1017/s1351324922000341

Kanishk Verma, Maja Popovic, Alexandros Poulis, Y. Cherkasova, Cathal Ó hÓbáin, A. Mazzone, Tijana Milosevic, Brian Davis

Cyberbullying is the wilful and repeated infliction of harm on an individual using the Internet and digital technologies. Similar to face-to-face bullying, cyberbullying can be captured formally using the Routine Activities Model (RAM) whereby the potential victim and bully are brought into proximity of one another via the interaction on online social networking (OSN) platforms. Although the impact of the COVID-19 (SARS-CoV-2) restrictions on the online presence of minors has yet to be fully grasped, studies have reported that 44% of pre-adolescents have encountered more cyberbullying incidents during the COVID-19 lockdown. Transparency reports shared by OSN companies indicate an increased take-downs of cyberbullying-related comments, posts or content by artificially intelligen moderation tools. However, in order to efficiently and effectively detect or identify whether a social media post or comment qualifies as cyberbullying, there are a number factors based on the RAM, which must be taken into account, which includes the identification of cyberbullying roles and forms. This demands the acquisition of large amounts of fine-grained annotated data which is costly and ethically challenging to produce. In addition where fine-grained datasets do exist they may be unavailable in the target language. Manual translation is costly and expensive, however, state-of-the-art neural machine translation offers a workaround. This study presents a first of its kind experiment in leveraging machine translation to automatically translate a unique pre-adolescent cyberbullying gold standard dataset in Italian with fine-grained annotations into English for training and testing a native binary classifier for pre-adolescent cyberbullying. In addition to contributing high-quality English reference translation of the source gold standard, our experiments indicate that the performance of our target binary classifier when trained on machine-translated English output is on par with the source (Italian) classifier.

网络欺凌是指利用互联网和数字技术故意反复对个人造成伤害。与面对面欺凌类似，可以使用日常活动模型（RAM）正式捕捉网络欺凌，通过在线社交网络（OSN）平台上的互动，将潜在的受害者和欺凌者拉近距离。尽管新冠肺炎（SARS-CoV-2）限制对未成年人在线的影响尚未完全掌握，但研究报告称，44%的学龄前青少年在新冠肺炎封锁期间遇到了更多的网络欺凌事件。OSN公司分享的透明度报告表明，人工智能审核工具越来越多地删除与网络欺凌相关的评论、帖子或内容。然而，为了有效地检测或识别社交媒体帖子或评论是否符合网络欺凌的条件，必须考虑基于RAM的许多因素，其中包括识别网络欺凌的角色和形式。这需要获取大量细粒度的注释数据，这是一项成本高昂且在道德上具有挑战性的工作。此外，在确实存在细粒度数据集的情况下，它们在目标语言中可能不可用。人工翻译成本高昂，但最先进的神经机器翻译提供了一种解决方法。这项研究首次利用机器翻译将一个具有细粒度注释的独特的青春期前网络欺凌金标准意大利语数据集自动翻译成英语，用于训练和测试青春期前网络霸凌的原生二元分类器。除了贡献源黄金标准的高质量英语参考翻译外，我们的实验表明，当在机器翻译的英语输出上训练时，我们的目标二进制分类器的性能与源（意大利语）分类器不相上下。

{"title":"Leveraging machine translation for cross-lingual fine-grained cyberbullying classification amongst pre-adolescents","authors":"Kanishk Verma, Maja Popovic, Alexandros Poulis, Y. Cherkasova, Cathal Ó hÓbáin, A. Mazzone, Tijana Milosevic, Brian Davis","doi":"10.1017/s1351324922000341","DOIUrl":"https://doi.org/10.1017/s1351324922000341","url":null,"abstract":"\u0000 Cyberbullying is the wilful and repeated infliction of harm on an individual using the Internet and digital technologies. Similar to face-to-face bullying, cyberbullying can be captured formally using the Routine Activities Model (RAM) whereby the potential victim and bully are brought into proximity of one another via the interaction on online social networking (OSN) platforms. Although the impact of the COVID-19 (SARS-CoV-2) restrictions on the online presence of minors has yet to be fully grasped, studies have reported that 44% of pre-adolescents have encountered more cyberbullying incidents during the COVID-19 lockdown. Transparency reports shared by OSN companies indicate an increased take-downs of cyberbullying-related comments, posts or content by artificially intelligen moderation tools. However, in order to efficiently and effectively detect or identify whether a social media post or comment qualifies as cyberbullying, there are a number factors based on the RAM, which must be taken into account, which includes the identification of cyberbullying roles and forms. This demands the acquisition of large amounts of fine-grained annotated data which is costly and ethically challenging to produce. In addition where fine-grained datasets do exist they may be unavailable in the target language. Manual translation is costly and expensive, however, state-of-the-art neural machine translation offers a workaround. This study presents a first of its kind experiment in leveraging machine translation to automatically translate a unique pre-adolescent cyberbullying gold standard dataset in Italian with fine-grained annotations into English for training and testing a native binary classifier for pre-adolescent cyberbullying. In addition to contributing high-quality English reference translation of the source gold standard, our experiments indicate that the performance of our target binary classifier when trained on machine-translated English output is on par with the source (Italian) classifier.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"1 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2022-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41972550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

RoLEX: The development of an extended Romanian lexical dataset and its evaluation at predicting concurrent lexical information RoLEX:一个扩展的罗马尼亚词汇数据集的发展及其评估在预测并发词汇信息

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering

Pub Date : 2022-08-26 DOI: 10.1017/S1351324922000419

Beáta Lőrincz, E. Irimia, Adriana Stan, Verginica Barbu Mititelu

Abstract In this article, we introduce an extended, freely available resource for the Romanian language, named RoLEX. The dataset was developed mainly for speech processing applications, yet its applicability extends beyond this domain. RoLEX includes over 330,000 curated entries with information regarding lemma, morphosyntactic description, syllabification, lexical stress and phonemic transcription. The process of selecting the list of word entries and semi-automatically annotating the complete lexical information associated with each of the entries is thoroughly described. The dataset’s inherent knowledge is then evaluated in a task of concurrent prediction of syllabification, lexical stress marking and phonemic transcription. The evaluation looked into several dataset design factors, such as the minimum viable number of entries for correct prediction, the optimisation of the minimum number of required entries through expert selection and the augmentation of the input with morphosyntactic information, as well as the influence of each task in the overall accuracy. The best results were obtained when the orthographic form of the entries was augmented with the complete morphosyntactic tags. A word error rate of 3.08% and a character error rate of 1.08% were obtained this way. We show that using a carefully selected subset of entries for training can result in a similar performance to the performance obtained by a larger set of randomly selected entries (twice as many). In terms of prediction complexity, the lexical stress marking posed most problems and accounts for around 60% of the errors in the predicted sequence.

摘要在本文中，我们介绍了一个扩展的、免费提供的罗马尼亚语资源，名为RoLEX。该数据集主要是为语音处理应用程序开发的，但其适用性超出了该领域。RoLEX包括超过330000个精选条目，其中包含关于引理、形态句法描述、音节划分、词汇重音和音位转录的信息。全面描述了选择单词条目列表和半自动注释与每个条目相关联的完整词汇信息的过程。然后在同时预测音节划分、词汇重音标记和音位转录的任务中评估数据集的固有知识。该评估考察了几个数据集设计因素，如正确预测的最小可行条目数、通过专家选择优化所需最小条目数、用形态句法信息增加输入，以及每个任务对总体准确性的影响。当条目的拼写形式增加了完整的形态句法标签时，获得了最好的结果。以这种方式获得了3.08%的单词错误率和1.08%的字符错误率。我们表明，使用精心选择的条目子集进行训练可以获得与更大的随机选择条目集（两倍多）所获得的性能相似的性能。就预测复杂性而言，词汇重音标记带来了大多数问题，约占预测序列错误的60%。

{"title":"RoLEX: The development of an extended Romanian lexical dataset and its evaluation at predicting concurrent lexical information","authors":"Beáta Lőrincz, E. Irimia, Adriana Stan, Verginica Barbu Mititelu","doi":"10.1017/S1351324922000419","DOIUrl":"https://doi.org/10.1017/S1351324922000419","url":null,"abstract":"Abstract In this article, we introduce an extended, freely available resource for the Romanian language, named RoLEX. The dataset was developed mainly for speech processing applications, yet its applicability extends beyond this domain. RoLEX includes over 330,000 curated entries with information regarding lemma, morphosyntactic description, syllabification, lexical stress and phonemic transcription. The process of selecting the list of word entries and semi-automatically annotating the complete lexical information associated with each of the entries is thoroughly described. The dataset’s inherent knowledge is then evaluated in a task of concurrent prediction of syllabification, lexical stress marking and phonemic transcription. The evaluation looked into several dataset design factors, such as the minimum viable number of entries for correct prediction, the optimisation of the minimum number of required entries through expert selection and the augmentation of the input with morphosyntactic information, as well as the influence of each task in the overall accuracy. The best results were obtained when the orthographic form of the entries was augmented with the complete morphosyntactic tags. A word error rate of 3.08% and a character error rate of 1.08% were obtained this way. We show that using a carefully selected subset of entries for training can result in a similar performance to the performance obtained by a larger set of randomly selected entries (twice as many). In terms of prediction complexity, the lexical stress marking posed most problems and accounts for around 60% of the errors in the predicted sequence.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"29 1","pages":"720 - 745"},"PeriodicalIF":2.5,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48763416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Recognition of visual scene elements from a story text in Persian natural language 波斯语自然语言故事文本视觉场景元素的识别

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering

Pub Date : 2022-08-24 DOI: 10.1017/s1351324922000390

Mojdeh Hashemi-Namin, M. Jahed-Motlagh, Adel Torkaman Rahmani

Abstract Text-to-scene conversion systems map natural language text to formal representations required for visual scenes. The difficulty involved in this mapping is one of the most critical challenges for developing these systems. The current study mapped Persian natural language text as the headmost system to a conceptual scene model. This conceptual scene model is an intermediate semantic representation between natural language and the visual scene and contains descriptions of visual elements of the scene. It will be used to produce meaningful animation based on an input story in this ongoing study. The mapping task was modeled as a sequential labeling problem, and a conditional random field (CRF) model was trained and tested for sequential labeling of scene model elements. To the best of the authors’ knowledge, no dataset for this task exists; thus, the required dataset was collected for this task. The lack of required off-the-shelf natural language processing modules and a significant error rate in the available corpora were important challenges to dataset collection. Some features of the dataset were manually annotated. The results were evaluated using standard text classification metrics, and an average accuracy of 85.7% was obtained, which is satisfactory.

摘要文本到场景转换系统将自然语言文本映射到视觉场景所需的形式表示。绘制地图所涉及的困难是开发这些系统的最关键挑战之一。目前的研究将波斯自然语言文本作为最重要的系统映射到概念场景模型中。该概念场景模型是自然语言和视觉场景之间的中间语义表示，并包含对场景视觉元素的描述。在这项正在进行的研究中，它将用于根据输入的故事制作有意义的动画。映射任务被建模为顺序标记问题，并且条件随机场（CRF）模型被训练和测试用于场景模型元素的顺序标记。据作者所知，不存在用于此任务的数据集；因此，为该任务收集了所需的数据集。缺乏所需的现成自然语言处理模块以及可用语料库中的显著错误率是数据集收集面临的重要挑战。数据集的一些特征是手动注释的。使用标准文本分类度量对结果进行评估，获得了85.7%的平均准确率，这是令人满意的。

{"title":"Recognition of visual scene elements from a story text in Persian natural language","authors":"Mojdeh Hashemi-Namin, M. Jahed-Motlagh, Adel Torkaman Rahmani","doi":"10.1017/s1351324922000390","DOIUrl":"https://doi.org/10.1017/s1351324922000390","url":null,"abstract":"Abstract Text-to-scene conversion systems map natural language text to formal representations required for visual scenes. The difficulty involved in this mapping is one of the most critical challenges for developing these systems. The current study mapped Persian natural language text as the headmost system to a conceptual scene model. This conceptual scene model is an intermediate semantic representation between natural language and the visual scene and contains descriptions of visual elements of the scene. It will be used to produce meaningful animation based on an input story in this ongoing study. The mapping task was modeled as a sequential labeling problem, and a conditional random field (CRF) model was trained and tested for sequential labeling of scene model elements. To the best of the authors’ knowledge, no dataset for this task exists; thus, the required dataset was collected for this task. The lack of required off-the-shelf natural language processing modules and a significant error rate in the available corpora were important challenges to dataset collection. Some features of the dataset were manually annotated. The results were evaluated using standard text classification metrics, and an average accuracy of 85.7% was obtained, which is satisfactory.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"29 1","pages":"693 - 719"},"PeriodicalIF":2.5,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47512224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantifying the impact of context on the quality of manual hate speech annotation 量化上下文对手动仇恨言论注释质量的影响

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering

Pub Date : 2022-08-22 DOI: 10.1017/s1351324922000353

Nikola Ljubesic, I. Mozetič, Petra Kralj Novak

The quality of annotations in manually annotated hate speech datasets is crucial for automatic hate speech detection. This contribution focuses on the positive effects of manually annotating online comments for hate speech within the context in which the comments occur. We quantify the impact of context availability by meticulously designing an experiment: Two annotation rounds are performed, one in-context and one out-of-context, on the same English YouTube data (more than 10,000 comments), by using the same annotation schema and platform, the same highly trained annotators, and quantifying annotation quality through inter-annotator agreement. Our results show that the presence of context has a significant positive impact on the quality of the manual annotations. This positive impact is more noticeable among replies than among comments, although the former is harder to consistently annotate overall. Previous research reporting that out-of-context annotations favour assigning non-hate-speech labels is also corroborated, showing further that this tendency is especially present among comments inciting violence, a highly relevant category for hate speech research and society overall. We believe that this work will improve future annotation campaigns even beyond hate speech and motivate further research on the highly relevant questions of data annotation methodology in natural language processing, especially in the light of the current expansion of its scope of application.

手动注释的仇恨言论数据集中的注释质量对于自动仇恨言论检测至关重要。这篇文章的重点是在评论发生的背景下，为仇恨言论手动注释在线评论的积极影响。我们通过精心设计一个实验来量化上下文可用性的影响：在相同的英语YouTube数据（超过10000条评论）上进行两轮注释，一轮是上下文中的，另一轮是断章取义的，使用相同的注释模式和平台，使用同样训练有素的注释者，并通过注释者之间的一致性来量化注释质量。我们的结果表明，上下文的存在对手册注释的质量有显著的积极影响。这种积极影响在回复中比在评论中更为明显，尽管前者更难在整体上得到一致的注释。先前的研究报告称，断章取义的注释有利于指定非仇恨言论标签，这也得到了证实，进一步表明，这种趋势尤其存在于煽动暴力的评论中，这是仇恨言论研究和整个社会的一个高度相关类别。我们相信，这项工作将改善未来的注释运动，甚至超越仇恨言论，并推动对自然语言处理中数据注释方法论高度相关问题的进一步研究，特别是考虑到目前其应用范围的扩大。

{"title":"Quantifying the impact of context on the quality of manual hate speech annotation","authors":"Nikola Ljubesic, I. Mozetič, Petra Kralj Novak","doi":"10.1017/s1351324922000353","DOIUrl":"https://doi.org/10.1017/s1351324922000353","url":null,"abstract":"\u0000 The quality of annotations in manually annotated hate speech datasets is crucial for automatic hate speech detection. This contribution focuses on the positive effects of manually annotating online comments for hate speech within the context in which the comments occur. We quantify the impact of context availability by meticulously designing an experiment: Two annotation rounds are performed, one in-context and one out-of-context, on the same English YouTube data (more than 10,000 comments), by using the same annotation schema and platform, the same highly trained annotators, and quantifying annotation quality through inter-annotator agreement. Our results show that the presence of context has a significant positive impact on the quality of the manual annotations. This positive impact is more noticeable among replies than among comments, although the former is harder to consistently annotate overall. Previous research reporting that out-of-context annotations favour assigning non-hate-speech labels is also corroborated, showing further that this tendency is especially present among comments inciting violence, a highly relevant category for hate speech research and society overall. We believe that this work will improve future annotation campaigns even beyond hate speech and motivate further research on the highly relevant questions of data annotation methodology in natural language processing, especially in the light of the current expansion of its scope of application.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47630033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

NLE volume 28 issue 5 Cover and Back matter NLE第28卷第5期封面和封底

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering

Pub Date : 2022-08-09 DOI: 10.1017/s1351324922000389

引用次数: 0

Emerging trends: Deep nets thrive on scale 新兴趋势:深度网络在规模上蓬勃发展

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering

Pub Date : 2022-08-09 DOI: 10.1017/S1351324922000365

Kenneth Ward Church

Abstract Deep nets are becoming larger and larger in practice, with no respect for (non)-factors that ought to limit growth including the so-called curse of dimensionality (CoD). Donoho suggested that dimensionality can be a blessing as well as a curse. Current practice in industry is well ahead of theory, but there are some recent theoretical results from Weinan E’s group suggesting that errors may be independent of dimensions $d$. Current practice suggests an even stronger conjecture: deep nets are not merely immune to CoD, but actually, deep nets thrive on scale.

深度网络在实践中变得越来越大，不考虑(非)因素，这些因素应该限制增长，包括所谓的维度诅咒(CoD)。多诺霍认为，维度既是一种祝福，也是一种诅咒。目前的工业实践远远超前于理论，但魏南E的团队最近的一些理论结果表明，误差可能与维度d无关。目前的实践提出了一个更强的猜想:深度网络不仅不受CoD的影响，而且实际上，深度网络在规模上茁壮成长。

引用次数: 1

NLE volume 28 issue 5 Cover and Front matter NLE第28卷第5期封面和封面问题

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering

Pub Date : 2022-08-09 DOI: 10.1017/s1351324922000377

R. Mitkov, B. Boguraev

whether or trans-lation, computer science or engineering. Its is to the computational linguistics research and the implementation of practical applications with potential real-world use. As well as publishing original research articles on a broad range of topics - from text analy- sis, machine translation, information retrieval, speech processing and generation to integrated systems and multi-modal interfaces - it also publishes special issues on specific natural language processing methods, tasks or applications. The journal welcomes survey papers describing the state of the art of a specific topic. The Journal of Natural Language Engineering also publishes the popular Industry Watch and Emerging Trends columns as well as book reviews.

无论是翻译、计算机科学还是工程。它是对计算语言学的研究和实现具有潜在现实用途的实际应用。除了发表广泛主题的原创研究文章-从文本分析，机器翻译，信息检索，语音处理和生成到集成系统和多模态接口-它还出版关于特定自然语言处理方法，任务或应用的特刊。本刊欢迎描述某一特定主题的研究现状的调查论文。《自然语言工程杂志》还出版流行的行业观察和新兴趋势专栏以及书评。

引用次数: 0