首页 > 最新文献

Recent Advances in Natural Language Processing最新文献

英文 中文
Personality-dependent Neural Text Summarization 人格依赖神经文本摘要
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_024
P. Costa, Ivandré Paraboni
In Natural Language Generation systems, personalization strategies - i.e, the use of information about a target author to generate text that (more) closely resembles human-produced language - have long been applied to improve results. The present work addresses one such strategy - namely, the use of Big Five personality information about the target author - applied to the case of abstractive text summarization using neural sequence-to-sequence models. Initial results suggest that having access to personality information does lead to more accurate (or human-like) text summaries, and paves the way for more robust systems of this kind.
在自然语言生成系统中,个性化策略——即使用目标作者的信息来生成(更)接近于人类产生的语言的文本——长期以来一直被应用于改善结果。目前的工作解决了一种这样的策略,即使用目标作者的大五人格信息,应用于使用神经序列到序列模型的抽象文本摘要的情况。初步结果表明,获得个性信息确实会产生更准确(或更像人类)的文本摘要,并为这类更强大的系统铺平道路。
{"title":"Personality-dependent Neural Text Summarization","authors":"P. Costa, Ivandré Paraboni","doi":"10.26615/978-954-452-056-4_024","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_024","url":null,"abstract":"In Natural Language Generation systems, personalization strategies - i.e, the use of information about a target author to generate text that (more) closely resembles human-produced language - have long been applied to improve results. The present work addresses one such strategy - namely, the use of Big Five personality information about the target author - applied to the case of abstractive text summarization using neural sequence-to-sequence models. Initial results suggest that having access to personality information does lead to more accurate (or human-like) text summaries, and paves the way for more robust systems of this kind.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114691344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bigger versus Similar: Selecting a Background Corpus for First Story Detection Based on Distributional Similarity 大与相似:基于分布相似度为第一故事检测选择背景语料库
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_150
Fei Wang, R. Ross, John D. Kelleher
The current state of the art for First Story Detection (FSD) are nearest neighbour-based models with traditional term vector representations; however, one challenge faced by FSD models is that the document representation is usually defined by the vocabulary and term frequency from a background corpus. Consequently, the ideal background corpus should arguably be both large-scale to ensure adequate term coverage, and similar to the target domain in terms of the language distribution. However, given these two factors cannot always be mutually satisfied, in this paper we examine whether the distributional similarity of common terms is more important than the scale of common terms for FSD. As a basis for our analysis we propose a set of metrics to quantitatively measure the scale of common terms and the distributional similarity between corpora. Using these metrics we rank different background corpora relative to a target corpus. We also apply models based on different background corpora to the FSD task. Our results show that term distributional similarity is more predictive of good FSD performance than the scale of common terms; and, thus we demonstrate that a smaller recent domain-related corpus will be more suitable than a very large-scale general corpus for FSD.
目前,第一故事检测(FSD)的最新技术是基于最近邻的模型,具有传统的术语向量表示;然而,FSD模型面临的一个挑战是,文档表示通常由来自背景语料库的词汇表和术语频率定义。因此,理想的背景语料库应该是大规模的,以确保足够的术语覆盖,并在语言分布方面与目标领域相似。然而,考虑到这两个因素并不总是相互满足,在本文中,我们研究了共同项的分布相似度是否比共同项的规模更重要。作为我们分析的基础,我们提出了一套指标来定量地衡量共同术语的规模和语料库之间的分布相似度。使用这些指标,我们将不同的背景语料库相对于目标语料库进行排序。我们还将基于不同背景语料库的模型应用于FSD任务。结果表明,项分布相似度比常用项的尺度更能预测良好的FSD性能;因此,我们证明了一个较小的近期领域相关语料库将比一个非常大规模的通用语料库更适合FSD。
{"title":"Bigger versus Similar: Selecting a Background Corpus for First Story Detection Based on Distributional Similarity","authors":"Fei Wang, R. Ross, John D. Kelleher","doi":"10.26615/978-954-452-056-4_150","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_150","url":null,"abstract":"The current state of the art for First Story Detection (FSD) are nearest neighbour-based models with traditional term vector representations; however, one challenge faced by FSD models is that the document representation is usually defined by the vocabulary and term frequency from a background corpus. Consequently, the ideal background corpus should arguably be both large-scale to ensure adequate term coverage, and similar to the target domain in terms of the language distribution. However, given these two factors cannot always be mutually satisfied, in this paper we examine whether the distributional similarity of common terms is more important than the scale of common terms for FSD. As a basis for our analysis we propose a set of metrics to quantitatively measure the scale of common terms and the distributional similarity between corpora. Using these metrics we rank different background corpora relative to a target corpus. We also apply models based on different background corpora to the FSD task. Our results show that term distributional similarity is more predictive of good FSD performance than the scale of common terms; and, thus we demonstrate that a smaller recent domain-related corpus will be more suitable than a very large-scale general corpus for FSD.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127594758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Quotation Detection and Classification with a Corpus-Agnostic Model 基于语料库不可知模型的引文检测与分类
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_103
Sean Papay, Sebastian Padó
The detection of quotations (i.e., reported speech, thought, and writing) has established itself as an NLP analysis task. However, state-of-the-art models have been developed on the basis of specific corpora and incorpo- rate a high degree of corpus-specific assumptions and knowledge, which leads to fragmentation. In the spirit of task-agnostic modeling, we present a corpus-agnostic neural model for quotation detection and evaluate it on three corpora that vary in language, text genre, and structural assumptions. The model (a) approaches the state-of-the-art on the corpora when using established feature sets and (b) shows reasonable performance even when us- ing solely word forms, which makes it applicable for non-standard (i.e., historical) corpora.
对引语(即转述的言语、思想和写作)的检测已经成为一项NLP分析任务。然而,最先进的模型是在特定语料库的基础上发展起来的,并且包含了高度的语料库特定假设和知识,这导致了碎片化。在任务不可知论建模的精神下,我们提出了一个语料库不可知论的引文检测神经模型,并在三个不同语言、文本类型和结构假设的语料库上对其进行了评估。当使用已建立的特征集时,该模型(a)在语料库上接近最先进的水平;(b)即使只使用词的形式,也显示出合理的性能,这使得它适用于非标准(即历史)语料库。
{"title":"Quotation Detection and Classification with a Corpus-Agnostic Model","authors":"Sean Papay, Sebastian Padó","doi":"10.26615/978-954-452-056-4_103","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_103","url":null,"abstract":"The detection of quotations (i.e., reported speech, thought, and writing) has established itself as an NLP analysis task. However, state-of-the-art models have been developed on the basis of specific corpora and incorpo- rate a high degree of corpus-specific assumptions and knowledge, which leads to fragmentation. In the spirit of task-agnostic modeling, we present a corpus-agnostic neural model for quotation detection and evaluate it on three corpora that vary in language, text genre, and structural assumptions. The model (a) approaches the state-of-the-art on the corpora when using established feature sets and (b) shows reasonable performance even when us- ing solely word forms, which makes it applicable for non-standard (i.e., historical) corpora.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115837696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Detecting Clitics Related Orthographic Errors in Turkish 检测土耳其语中与关键字相关的正字法错误
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_009
Uğurcan Arıkan, Onur Güngör, S. Uskudarli
For the spell correction task, vocabulary based methods have been replaced with methods that take morphological and grammar rules into account. However, such tools are fairly immature, and, worse, non-existent for many low resource languages. Checking only if a word is well-formed with respect to the morphological rules of a language may produce false negatives due to the ambiguity resulting from the presence of numerous homophonic words. In this work, we propose an approach to detect and correct the “de/da” clitic errors in Turkish text. Our model is a neural sequence tagger trained with a synthetically constructed dataset consisting of positive and negative samples. The model’s performance with this dataset is presented according to different word embedding configurations. The model achieved an F1 score of 86.67% on a synthetically constructed dataset. We also compared the model’s performance on a manually curated dataset of challenging samples that proved superior to other spelling correctors with 71% accuracy compared to the second-best (Google Docs) with and accuracy of 34%.
对于拼写纠正任务,基于词汇的方法已经被考虑形态和语法规则的方法所取代。然而,这样的工具相当不成熟,更糟糕的是,对于许多低资源语言来说不存在。仅根据语言的形态规则检查一个词是否结构良好可能会产生假否定,因为存在大量同音词会产生歧义。在这项工作中,我们提出了一种检测和纠正土耳其语文本中“de/da”断句错误的方法。我们的模型是一个由正样本和负样本组成的合成数据集训练的神经序列标注器。根据不同的词嵌入配置,给出了模型在该数据集上的性能。该模型在综合构建的数据集上获得了86.67%的F1分数。我们还比较了该模型在人工整理的具有挑战性的样本数据集上的表现,该数据集的准确率为71%,优于其他拼写校正器,而第二好的(谷歌文档)的准确率为34%。
{"title":"Detecting Clitics Related Orthographic Errors in Turkish","authors":"Uğurcan Arıkan, Onur Güngör, S. Uskudarli","doi":"10.26615/978-954-452-056-4_009","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_009","url":null,"abstract":"For the spell correction task, vocabulary based methods have been replaced with methods that take morphological and grammar rules into account. However, such tools are fairly immature, and, worse, non-existent for many low resource languages. Checking only if a word is well-formed with respect to the morphological rules of a language may produce false negatives due to the ambiguity resulting from the presence of numerous homophonic words. In this work, we propose an approach to detect and correct the “de/da” clitic errors in Turkish text. Our model is a neural sequence tagger trained with a synthetically constructed dataset consisting of positive and negative samples. The model’s performance with this dataset is presented according to different word embedding configurations. The model achieved an F1 score of 86.67% on a synthetically constructed dataset. We also compared the model’s performance on a manually curated dataset of challenging samples that proved superior to other spelling correctors with 71% accuracy compared to the second-best (Google Docs) with and accuracy of 34%.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132526666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Exploiting Open IE for Deriving Multiple Premises Entailment Corpus 利用开放IE获取多前提蕴涵语料库
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_144
Martin Vita, Jakub Klímek
Natural language inference (NLI) is a key part of natural language understanding. The NLI task is defined as a decision problem whether a given sentence – hypothesis – can be inferred from a given text. Typically, we deal with a text consisting of just a single premise/single sentence, which is called a single premise entailment (SPE) task. Recently, a derived task of NLI from multiple premises (MPE) was introduced together with the first annotated corpus and corresponding several strong baselines. Nevertheless, the further development in MPE field requires accessibility of huge amounts of annotated data. In this paper we introduce a novel method for rapid deriving of MPE corpora from an existing NLI (SPE) annotated data that does not require any additional annotation work. This proposed approach is based on using an open information extraction system. We demonstrate the application of the method on a well known SNLI corpus. Over the obtained corpus, we provide the first evaluations as well as we state a strong baseline.
自然语言推理是自然语言理解的重要组成部分。NLI任务被定义为一个判断问题,即是否可以从给定的文本中推断出给定的句子或假设。通常,我们处理仅由单个前提/单个句子组成的文本,这称为单个前提蕴涵(SPE)任务。最近,引入了一个衍生的多前提NLI任务(MPE),以及第一个标注语料库和相应的几个强基线。然而,MPE领域的进一步发展需要大量注释数据的可访问性。本文介绍了一种从已有的NLI (SPE)注释数据中快速提取MPE语料库的新方法,该方法不需要任何额外的注释工作。该方法基于使用一个开放的信息提取系统。我们演示了该方法在一个著名的SNLI语料库上的应用。在获得的语料库上,我们提供了第一次评估,并声明了一个强大的基线。
{"title":"Exploiting Open IE for Deriving Multiple Premises Entailment Corpus","authors":"Martin Vita, Jakub Klímek","doi":"10.26615/978-954-452-056-4_144","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_144","url":null,"abstract":"Natural language inference (NLI) is a key part of natural language understanding. The NLI task is defined as a decision problem whether a given sentence – hypothesis – can be inferred from a given text. Typically, we deal with a text consisting of just a single premise/single sentence, which is called a single premise entailment (SPE) task. Recently, a derived task of NLI from multiple premises (MPE) was introduced together with the first annotated corpus and corresponding several strong baselines. Nevertheless, the further development in MPE field requires accessibility of huge amounts of annotated data. In this paper we introduce a novel method for rapid deriving of MPE corpora from an existing NLI (SPE) annotated data that does not require any additional annotation work. This proposed approach is based on using an open information extraction system. We demonstrate the application of the method on a well known SNLI corpus. Over the obtained corpus, we provide the first evaluations as well as we state a strong baseline.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130840660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Assessing socioeconomic status of Twitter users: A survey 评估Twitter用户的社会经济地位:一项调查
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_046
Dhouha Ghazouani, L. Lancieri, Habib Ounelli, J. Chaker
Every day, the emotion and opinion of different people across the world are reflected in the form of short messages using microblogging platforms. Despite the existence of enormous potential introduced by this data source, the Twitter community is still ambiguous and is not fully explored yet. While there are a huge number of studies examining the possibilities of inferring gender and age, there exist hardly researches on socioeconomic status (SES) inference of Twitter users. As socioeconomic status is essential to treating diverse questions linked to human behavior in several fields (sociology, demography, public health, etc.), we conducted a comprehensive literature review of SES studies, inference methods, and metrics. With reference to the research on literature’s results, we came to outline the most critical challenges for researchers. To the best of our knowledge, this paper is the first review that introduces the different aspects of SES inference. Indeed, this article provides the benefits for practitioners who aim to process and explore Twitter SES inference.
每天,世界各地不同的人的情感和观点都在微博平台上以短信的形式体现出来。尽管这个数据源带来了巨大的潜力,但Twitter社区仍然模棱两可,尚未得到充分的探索。虽然有大量的研究探讨了推断性别和年龄的可能性,但很少有关于Twitter用户社会经济地位(SES)推断的研究。由于社会经济地位对于在多个领域(社会学、人口学、公共卫生等)处理与人类行为相关的各种问题至关重要,我们对社会经济地位研究、推理方法和指标进行了全面的文献综述。参考文献结果的研究,我们概述了研究人员面临的最关键挑战。据我们所知,本文是第一篇介绍SES推理不同方面的综述。实际上,本文为旨在处理和探索Twitter SES推理的从业者提供了好处。
{"title":"Assessing socioeconomic status of Twitter users: A survey","authors":"Dhouha Ghazouani, L. Lancieri, Habib Ounelli, J. Chaker","doi":"10.26615/978-954-452-056-4_046","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_046","url":null,"abstract":"Every day, the emotion and opinion of different people across the world are reflected in the form of short messages using microblogging platforms. Despite the existence of enormous potential introduced by this data source, the Twitter community is still ambiguous and is not fully explored yet. While there are a huge number of studies examining the possibilities of inferring gender and age, there exist hardly researches on socioeconomic status (SES) inference of Twitter users. As socioeconomic status is essential to treating diverse questions linked to human behavior in several fields (sociology, demography, public health, etc.), we conducted a comprehensive literature review of SES studies, inference methods, and metrics. With reference to the research on literature’s results, we came to outline the most critical challenges for researchers. To the best of our knowledge, this paper is the first review that introduces the different aspects of SES inference. Indeed, this article provides the benefits for practitioners who aim to process and explore Twitter SES inference.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131105478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Qualitative Evaluation Framework for Paraphrase Identification 意译鉴定的定性评估框架
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_067
Venelin Kovatchev, M. A. Martí, Maria Salamó, Javier Beltrán
In this paper, we present a new approach for the evaluation, error analysis, and interpretation of supervised and unsupervised Paraphrase Identification (PI) systems. Our evaluation framework makes use of a PI corpus annotated with linguistic phenomena to provide a better understanding and interpretation of the performance of various PI systems. Our approach allows for a qualitative evaluation and comparison of the PI models using human interpretable categories. It does not require modification of the training objective of the systems and does not place additional burden on the developers. We replicate several popular supervised and unsupervised PI systems. Using our evaluation framework we show that: 1) Each system performs differently with respect to a set of linguistic phenomena and makes qualitatively different kinds of errors; 2) Some linguistic phenomena are more challenging than others across all systems.
在本文中,我们提出了一种新的方法来评估、误差分析和解释监督和非监督释义识别(PI)系统。我们的评估框架使用带有语言现象注释的PI语料库来更好地理解和解释各种PI系统的性能。我们的方法允许使用人类可解释的类别对PI模型进行定性评估和比较。它不需要修改系统的训练目标,也不会给开发人员带来额外的负担。我们复制了几个流行的监督和无监督PI系统。使用我们的评估框架,我们表明:1)每个系统对一组语言现象的表现不同,产生的错误在性质上也不同;2)在所有系统中,有些语言现象比其他现象更具挑战性。
{"title":"A Qualitative Evaluation Framework for Paraphrase Identification","authors":"Venelin Kovatchev, M. A. Martí, Maria Salamó, Javier Beltrán","doi":"10.26615/978-954-452-056-4_067","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_067","url":null,"abstract":"In this paper, we present a new approach for the evaluation, error analysis, and interpretation of supervised and unsupervised Paraphrase Identification (PI) systems. Our evaluation framework makes use of a PI corpus annotated with linguistic phenomena to provide a better understanding and interpretation of the performance of various PI systems. Our approach allows for a qualitative evaluation and comparison of the PI models using human interpretable categories. It does not require modification of the training objective of the systems and does not place additional burden on the developers. We replicate several popular supervised and unsupervised PI systems. Using our evaluation framework we show that: 1) Each system performs differently with respect to a set of linguistic phenomena and makes qualitatively different kinds of errors; 2) Some linguistic phenomena are more challenging than others across all systems.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"541 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133453693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Moral Stance Recognition and Polarity Classification from Twitter and Elicited Text 推特与引申文本的道德立场识别与极性分类
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_123
W. Santos, Ivandré Paraboni
We introduce a labelled corpus of stances about moral issues for the Brazilian Portuguese language, and present reference results for both the stance recognition and polarity classification tasks. The corpus is built from Twitter and further expanded with data elicited through crowd sourcing and labelled by their own authors. Put together, the corpus and reference results are expected to be taken as a baseline for further studies in the field of stance recognition and polarity classification from text.
我们为巴西葡萄牙语引入了一个关于道德问题的标记语料库,并为立场识别和极性分类任务提供了参考结果。语料库建立在Twitter上,并进一步扩展了通过众包获得的数据,并由他们自己的作者标记。综上所述,语料库和参考文献的结果有望作为进一步研究文本立场识别和极性分类领域的基线。
{"title":"Moral Stance Recognition and Polarity Classification from Twitter and Elicited Text","authors":"W. Santos, Ivandré Paraboni","doi":"10.26615/978-954-452-056-4_123","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_123","url":null,"abstract":"We introduce a labelled corpus of stances about moral issues for the Brazilian Portuguese language, and present reference results for both the stance recognition and polarity classification tasks. The corpus is built from Twitter and further expanded with data elicited through crowd sourcing and labelled by their own authors. Put together, the corpus and reference results are expected to be taken as a baseline for further studies in the field of stance recognition and polarity classification from text.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133677459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
The Impact of Rule-Based Text Generation on the Quality of Abstractive Summaries 基于规则的文本生成对抽象摘要质量的影响
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_146
Tatiana Vodolazova, Elena Lloret
In this paper we describe how an abstractive text summarization method improved the informativeness of automatic summaries by integrating syntactic text simplification, subject-verb-object concept frequency scoring and a set of rules that transform text into its semantic representation. We analyzed the impact of each component of our approach on the quality of generated summaries and tested it on DUC 2002 dataset. Our experiments showed that our approach outperformed other state-of-the-art abstractive methods while maintaining acceptable linguistic quality and redundancy rate.
在本文中,我们描述了一种抽象文本摘要方法如何通过整合文本句法简化、主谓宾概念频率评分和一套将文本转换为其语义表示的规则来提高自动摘要的信息性。我们分析了我们方法的每个组成部分对生成摘要质量的影响,并在DUC 2002数据集上进行了测试。我们的实验表明,我们的方法优于其他最先进的抽象方法,同时保持可接受的语言质量和冗余率。
{"title":"The Impact of Rule-Based Text Generation on the Quality of Abstractive Summaries","authors":"Tatiana Vodolazova, Elena Lloret","doi":"10.26615/978-954-452-056-4_146","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_146","url":null,"abstract":"In this paper we describe how an abstractive text summarization method improved the informativeness of automatic summaries by integrating syntactic text simplification, subject-verb-object concept frequency scoring and a set of rules that transform text into its semantic representation. We analyzed the impact of each component of our approach on the quality of generated summaries and tested it on DUC 2002 dataset. Our experiments showed that our approach outperformed other state-of-the-art abstractive methods while maintaining acceptable linguistic quality and redundancy rate.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115724041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Quasi Bidirectional Encoder Representations from Transformers for Word Sense Disambiguation 面向词义消歧的变形器的准双向编码器表示
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_015
Michele Bevilacqua, Roberto Navigli
While contextualized embeddings have produced performance breakthroughs in many Natural Language Processing (NLP) tasks, Word Sense Disambiguation (WSD) has not benefited from them yet. In this paper, we introduce QBERT, a Transformer-based architecture for contextualized embeddings which makes use of a co-attentive layer to produce more deeply bidirectional representations, better-fitting for the WSD task. As a result, we are able to train a WSD system that beats the state of the art on the concatenation of all evaluation datasets by over 3 points, also outperforming a comparable model using ELMo.
虽然上下文嵌入在许多自然语言处理(NLP)任务中取得了性能突破,但词义消歧(WSD)尚未从中受益。在本文中,我们介绍了QBERT,这是一种基于transformer的上下文嵌入体系结构,它利用共同关注层来产生更深入的双向表示,更适合WSD任务。因此,我们能够训练出一个WSD系统,该系统在所有评估数据集的连接方面超过了目前最先进的系统3分以上,也优于使用ELMo的可比模型。
{"title":"Quasi Bidirectional Encoder Representations from Transformers for Word Sense Disambiguation","authors":"Michele Bevilacqua, Roberto Navigli","doi":"10.26615/978-954-452-056-4_015","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_015","url":null,"abstract":"While contextualized embeddings have produced performance breakthroughs in many Natural Language Processing (NLP) tasks, Word Sense Disambiguation (WSD) has not benefited from them yet. In this paper, we introduce QBERT, a Transformer-based architecture for contextualized embeddings which makes use of a co-attentive layer to produce more deeply bidirectional representations, better-fitting for the WSD task. As a result, we are able to train a WSD system that beats the state of the art on the concatenation of all evaluation datasets by over 3 points, also outperforming a comparable model using ELMo.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124152302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
Recent Advances in Natural Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1