首页 > 最新文献

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020最新文献

英文 中文
A Case Study of Natural Gender Phenomena in Translation. A Comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish 翻译中自然性别现象的个案研究。谷歌翻译、必应、微软翻译和DeepL对英语意大利语、法语和西班牙语的比较
Pub Date : 2020-10-01 DOI: 10.4000/books.aaccademia.8844
Argentina Anna Rescigno, Eva Vanmassenhove, J. Monti, Andy Way
This paper presents the results of an evaluation of Google Translate, DeepL and Bing Microsoft Translator with reference to natural gender translation and provides statistics about the frequency of female, male and neutral forms in the translations of a list of personality adjectives, and nouns referring to professions and bigender nouns. The evaluation is carried out for English→Spanish, English→Italian and English→French.
本文介绍了谷歌翻译、DeepL和必应微软翻译在自然性别翻译方面的评估结果,并统计了一组人格形容词、指代职业的名词和更大的名词的翻译中女性、男性和中性形式的频率。评价顺序为英语→西班牙语、英语→意大利语和英语→法语。
{"title":"A Case Study of Natural Gender Phenomena in Translation. A Comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish","authors":"Argentina Anna Rescigno, Eva Vanmassenhove, J. Monti, Andy Way","doi":"10.4000/books.aaccademia.8844","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8844","url":null,"abstract":"This paper presents the results of an evaluation of Google Translate, DeepL and Bing Microsoft Translator with reference to natural gender translation and provides statistics about the frequency of female, male and neutral forms in the translations of a list of personality adjectives, and nouns referring to professions and bigender nouns. The evaluation is carried out for English→Spanish, English→Italian and English→French.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122762140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
How Granularity of Orthography-Phonology Mappings Affect Reading Development: Evidence from a Computational Model of English Word Reading and Spelling 正字法-音系映射的粒度如何影响阅读发展:来自英语单词阅读和拼写计算模型的证据
Pub Date : 2020-09-01 DOI: 10.4000/books.aaccademia.8628
A. Lim, B. O’Brien, Luca Onnis
It is widely held that children implicitly learn the structure of their writing system through statistical learning of spelling-tosound mappings. Yet an unresolved question is how to sequence reading experience so that children can ‘pick up’ the structure optimally. We tackle this question here using a computational model of encoding and decoding. The order of presentation of words was manipulated so that they exhibited two distinct progressions of granularity of spelling-to-sound mappings. We found that under a training regime that introduced written words progressively from small-to-large granularity, the network exhibited an early advantage in reading acquisition as compared to a regime introducing written words from large-to-small granularity. Our results thus provide support for the grain size theory (Ziegler and Goswami, 2005) and demonstrate that the order of learning can influence learning trajectories of literacy skills.
人们普遍认为,儿童通过统计学习拼写到发音的映射,隐性地学习他们的写作系统的结构。然而,一个尚未解决的问题是,如何排序阅读经验,使孩子们能够最佳地“选择”结构。我们在这里使用编码和解码的计算模型来解决这个问题。对单词的表示顺序进行了处理,使它们表现出拼写到声音映射粒度的两种不同的进展。我们发现,在从小到大粒度逐步引入书面单词的训练机制下,与从大到小粒度引入书面单词的训练机制相比,网络在阅读习得方面表现出早期优势。因此,我们的研究结果为粒度理论(Ziegler和Goswami, 2005)提供了支持,并证明了学习的顺序可以影响读写技能的学习轨迹。
{"title":"How Granularity of Orthography-Phonology Mappings Affect Reading Development: Evidence from a Computational Model of English Word Reading and Spelling","authors":"A. Lim, B. O’Brien, Luca Onnis","doi":"10.4000/books.aaccademia.8628","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8628","url":null,"abstract":"It is widely held that children implicitly learn the structure of their writing system through statistical learning of spelling-tosound mappings. Yet an unresolved question is how to sequence reading experience so that children can ‘pick up’ the structure optimally. We tackle this question here using a computational model of encoding and decoding. The order of presentation of words was manipulated so that they exhibited two distinct progressions of granularity of spelling-to-sound mappings. We found that under a training regime that introduced written words progressively from small-to-large granularity, the network exhibited an early advantage in reading acquisition as compared to a regime introducing written words from large-to-small granularity. Our results thus provide support for the grain size theory (Ziegler and Goswami, 2005) and demonstrate that the order of learning can influence learning trajectories of literacy skills.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124370545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain Adaptation for Text Classification with Weird Embeddings 怪异嵌入文本分类的领域自适应
Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8250
Valerio Basile
Pre-trained word embeddings are often used to initialize deep learning models for text classification, as a way to inject precomputed lexical knowledge and boost the learning process. However, such embeddings are usually trained on generic corpora, while text classification tasks are often domain-specific. We propose a fully automated method to adapt pre-trained word embeddings to any given classification task, that needs no additional resource other than the original training set. The method is based on the concept of word weirdness, extended to score the words in the training set according to how characteristic they are with respect to the labels of a text classification dataset. The polarized weirdness scores are then used to update the word embeddings to reflect taskspecific semantic shifts. Our experiments show that this method is beneficial to the performance of several text classification tasks in different languages.
预训练词嵌入通常用于初始化文本分类的深度学习模型,作为一种注入预先计算的词汇知识并促进学习过程的方法。然而,这种嵌入通常是在通用语料库上训练的,而文本分类任务通常是特定于领域的。我们提出了一种完全自动化的方法,使预训练的词嵌入适应任何给定的分类任务,除了原始训练集之外,不需要额外的资源。该方法基于单词怪异度的概念,扩展到根据训练集中的单词相对于文本分类数据集的标签的特征程度对单词进行评分。然后使用极化怪异度分数来更新词嵌入,以反映特定任务的语义变化。实验结果表明,该方法对不同语言文本分类任务的性能有较好的提高。
{"title":"Domain Adaptation for Text Classification with Weird Embeddings","authors":"Valerio Basile","doi":"10.4000/books.aaccademia.8250","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8250","url":null,"abstract":"Pre-trained word embeddings are often used to initialize deep learning models for text classification, as a way to inject precomputed lexical knowledge and boost the learning process. However, such embeddings are usually trained on generic corpora, while text classification tasks are often domain-specific. We propose a fully automated method to adapt pre-trained word embeddings to any given classification task, that needs no additional resource other than the original training set. The method is based on the concept of word weirdness, extended to score the words in the training set according to how characteristic they are with respect to the labels of a text classification dataset. The polarized weirdness scores are then used to update the word embeddings to reflect taskspecific semantic shifts. Our experiments show that this method is beneficial to the performance of several text classification tasks in different languages.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117204780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
#andràtuttobene: Images, Texts, Emojis and Geodata in a Sentiment Analysis Pipeline #andràtuttobene:情感分析管道中的图像,文本,表情符号和地理数据
Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8954
Pierluigi Vitale, Serena Pelosi, M. Falco
This research investigates Instagram users’ sentiment narrated during the lockdown period in Italy, caused by the COVID-19 pandemic The study is based on the analysis of all the posts published on Instagram under the hashtag #andratuttobene on May 4, May 18 and June 3, 2020 Our research carried out a view on a national, regional and provincial scale We analyzed all the different languages and forms (i e captions, hashtags, emojis and images) that constitute the posts The aim of this research is to provide a set of procedures revealing the different polarity trends for each kind of expression and to propose a single comprehensive measure Copyright © 2020 for this paper by its authors
这项研究调查了2019冠状病毒病大流行导致的意大利封锁期间Instagram用户的情绪,该研究基于对2020年5月4日、5月18日和6月3日在Instagram上发布的标签#andratuttobene下的所有帖子的分析。我们的研究从国家、地区和省的角度进行了观察。我们分析了所有不同的语言和形式(如标题、标签、本研究的目的是提供一套揭示每种表情的不同极性趋势的程序,并提出单一的综合措施版权所有©2020由其作者为本文提供
{"title":"#andràtuttobene: Images, Texts, Emojis and Geodata in a Sentiment Analysis Pipeline","authors":"Pierluigi Vitale, Serena Pelosi, M. Falco","doi":"10.4000/books.aaccademia.8954","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8954","url":null,"abstract":"This research investigates Instagram users’ sentiment narrated during the lockdown period in Italy, caused by the COVID-19 pandemic The study is based on the analysis of all the posts published on Instagram under the hashtag #andratuttobene on May 4, May 18 and June 3, 2020 Our research carried out a view on a national, regional and provincial scale We analyzed all the different languages and forms (i e captions, hashtags, emojis and images) that constitute the posts The aim of this research is to provide a set of procedures revealing the different polarity trends for each kind of expression and to propose a single comprehensive measure Copyright © 2020 for this paper by its authors","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127545085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Becoming JILDA
Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8915
Irene Sucameli, Alessandro Lenci, B. Magnini, M. Simi, Manuela Speranza
English. The difficulty in finding useful dialogic data to train a conversational agent is an open issue even nowadays, when chatbots and spoken dialogue systems are widely used. For this reason we decided to build JILDA, a novel data collection of chat-based dialogues, produced by Italian native speakers and related to the job-offer domain. JILDA is the first dialogue collection related to this domain for the Italian language. Because of its collection modalities, we believe that JILDA can be a useful resource not only for the Italian research community, but also for the international one. Italiano. Negli ultimi anni l’utilizzo di chatbot e sistemi dialogici è diventato sempre più comune; tuttavia, il reperimento di dati di apprendimento adeguati per addestrare agenti conversazionali costituisce ancora una questione irrisolta. Per questo motivo abbiamo deciso di produrre JILDA, un nuovo dataset di dialoghi relativi al dominio della ricerca del lavoro e realizzati via chat da parlanti nativi italiani. JILDA costituisce la prima collezione di dialoghi relativi a questo dominio, in lingua italiana. Per gli aspetti metodologici e la modalità di raccolta dei dati, riteniamo che una simile risorsa possa essere utile ed interessante non solo per la comunità di ricerca italiana ma anche per quella internazionale.
英语。即使在聊天机器人和口语对话系统被广泛使用的今天,寻找有用的对话数据来训练会话代理的困难也是一个悬而未决的问题。出于这个原因,我们决定建立JILDA,这是一个基于聊天对话的新颖数据集,由意大利语母语人士制作,与工作招聘领域相关。JILDA是第一个与此领域相关的意大利语对话集。由于其收集方式,我们相信JILDA不仅可以成为意大利研究界的有用资源,而且可以成为国际研究界的有用资源。意大利语。最后,我想说的是,我将利用聊天的方式与系统对话è进行交流più交流;土耳其,将要求在对话构成和任何问题上都有足够的能力,并要求在对话构成和任何问题上都有足够的能力。根据JILDA的提问动机(question to motivo abbiamo decisiiso di produrre),新的数据集(dataset)和对话(relative dominio)、数据集(ricerca)和数据集(lavori)通过对话(chat da parlanti native italani)得以实现。JILDA以意大利语组成了一个主要的学院,就有关问题进行对话。从方法论的角度看,从数据的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从社会的角度看,从方法的角度看,意大利的角度看,国际上的角度看,都是非常有用的。
{"title":"Becoming JILDA","authors":"Irene Sucameli, Alessandro Lenci, B. Magnini, M. Simi, Manuela Speranza","doi":"10.4000/books.aaccademia.8915","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8915","url":null,"abstract":"English. The difficulty in finding useful dialogic data to train a conversational agent is an open issue even nowadays, when chatbots and spoken dialogue systems are widely used. For this reason we decided to build JILDA, a novel data collection of chat-based dialogues, produced by Italian native speakers and related to the job-offer domain. JILDA is the first dialogue collection related to this domain for the Italian language. Because of its collection modalities, we believe that JILDA can be a useful resource not only for the Italian research community, but also for the international one. Italiano. Negli ultimi anni l’utilizzo di chatbot e sistemi dialogici è diventato sempre più comune; tuttavia, il reperimento di dati di apprendimento adeguati per addestrare agenti conversazionali costituisce ancora una questione irrisolta. Per questo motivo abbiamo deciso di produrre JILDA, un nuovo dataset di dialoghi relativi al dominio della ricerca del lavoro e realizzati via chat da parlanti nativi italiani. JILDA costituisce la prima collezione di dialoghi relativi a questo dominio, in lingua italiana. Per gli aspetti metodologici e la modalità di raccolta dei dati, riteniamo che una simile risorsa possa essere utile ed interessante non solo per la comunità di ricerca italiana ma anche per quella internazionale.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122178507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Surviving the Legal Jungle: Text Classification of Italian Laws in Extremely Noisy Conditions 幸存的法律丛林:文本分类的意大利法律在极端嘈杂的条件
Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8390
Riccardo Coltrinari, Alessandro Antinori, Fabio Celli
In this paper, we present a method based on Linear Discriminant Analysis for legal text classification of extremely noisy data, such as duplicated documents classified in different classes. The results show that Linear Discriminant Analysis obtains very good performances both in clean and noisy conditions, if used as classifier in ensemble learning and in multi-label text classification. 1 Motivation and Background We address text categorization of businessoriented legal documents in Italian, but with a custom and overlapping hierarchy of product categories. A typical approach to tackle similar tasks is to exploit resources such as EUROVOC (Daudaravicius, 2012), a multilingual thesaurus consisting of over 6700 hierarchically-organised class descriptors used by many organizations of the European Union (EU) for the classification and retrieval of official documents. Our editorial system has a hierarchy of 23 product categories and more than 20600 labels, manually annotated and customized for different clients in more than 15 years, hence it is not possible to exploit resources like EUROVOC to categorize documents. In this paper, we propose a fast and efficient method for document classification for noisy data based on Linear Discriminant Analysis, a dimensionality reduction technique that has been employed successfully in many domains, including neuroimaging and medicine. We believe that our contribution will be useful to the NLP community in the context of document categorization as Copyright c ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). well as automatic ontology population, in particular when dealing with very noisy data. The paper is structured as follows: in Section 1.1 we present the related works in the field of text classification and the potential of Linear Discriminant Analysis, in Section 2 we describe the datasets we used, in Section 3 we report and discuss the result of our classification experiments and in Section 4 we draw our conclusions.
在本文中,我们提出了一种基于线性判别分析的法律文本分类方法,用于极大噪声数据的法律文本分类。结果表明,将线性判别分析作为分类器用于集成学习和多标签文本分类,无论在清洁条件下还是在噪声条件下都能取得很好的效果。1动机和背景我们在意大利语中解决面向商业的法律文件的文本分类,但具有自定义和重叠的产品类别层次结构。解决类似任务的一个典型方法是利用EUROVOC (Daudaravicius, 2012)等资源,这是一个多语言词典,由超过6700个分层组织的类描述符组成,被欧盟(EU)的许多组织用于分类和检索官方文件。我们的编辑系统有23个产品类别和超过20600个标签,在超过15年的时间里为不同的客户手工注释和定制,因此不可能利用EUROVOC这样的资源对文档进行分类。在本文中,我们提出了一种基于线性判别分析的快速有效的文档分类方法,这种降维技术已经成功地应用于许多领域,包括神经影像学和医学。我们相信我们的贡献将对NLP社区在文档分类的背景下有用。本文作者的版权为c©2020。在知识共享许可国际署名4.0 (CC BY 4.0)下允许使用。以及自动本体填充,特别是在处理非常嘈杂的数据时。本文的结构如下:在1.1节中,我们介绍了文本分类领域的相关工作和线性判别分析的潜力,在第2节中,我们描述了我们使用的数据集,在第3节中,我们报告并讨论了我们的分类实验结果,在第4节中,我们得出了我们的结论。
{"title":"Surviving the Legal Jungle: Text Classification of Italian Laws in Extremely Noisy Conditions","authors":"Riccardo Coltrinari, Alessandro Antinori, Fabio Celli","doi":"10.4000/books.aaccademia.8390","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8390","url":null,"abstract":"In this paper, we present a method based on Linear Discriminant Analysis for legal text classification of extremely noisy data, such as duplicated documents classified in different classes. The results show that Linear Discriminant Analysis obtains very good performances both in clean and noisy conditions, if used as classifier in ensemble learning and in multi-label text classification. 1 Motivation and Background We address text categorization of businessoriented legal documents in Italian, but with a custom and overlapping hierarchy of product categories. A typical approach to tackle similar tasks is to exploit resources such as EUROVOC (Daudaravicius, 2012), a multilingual thesaurus consisting of over 6700 hierarchically-organised class descriptors used by many organizations of the European Union (EU) for the classification and retrieval of official documents. Our editorial system has a hierarchy of 23 product categories and more than 20600 labels, manually annotated and customized for different clients in more than 15 years, hence it is not possible to exploit resources like EUROVOC to categorize documents. In this paper, we propose a fast and efficient method for document classification for noisy data based on Linear Discriminant Analysis, a dimensionality reduction technique that has been employed successfully in many domains, including neuroimaging and medicine. We believe that our contribution will be useful to the NLP community in the context of document categorization as Copyright c ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). well as automatic ontology population, in particular when dealing with very noisy data. The paper is structured as follows: in Section 1.1 we present the related works in the field of text classification and the potential of Linear Discriminant Analysis, in Section 2 we describe the datasets we used, in Section 3 we report and discuss the result of our classification experiments and in Section 4 we draw our conclusions.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121469372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Italian Counter Narrative Generation to Fight Online Hate Speech 意大利反叙事一代打击网络仇恨言论
Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8378
Yi-Ling Chung, Serra Sinem Tekiroğlu, Marco Guerini
English. Counter Narratives are textual responses meant to withstand online hatred and prevent its spreading. The use of neural architectures for the generation of Counter Narratives (CNs) is beginning to be investigated by the NLP community. Still, the efforts were solely targeting English. In this paper, we try to fill the gap for Italian, studying how to implement CN generation approaches effectively. We experiment with an existing dataset of CNs and a novel language model, recently released for Italian, under several configurations, including zero and few shot learning. Results show that even for underresourced languages, data augmentation strategies paired with large unsupervised LMs can held promising results. Italiano. Le Contro Narrative sono risposte testuali volte a contrastare l’odio online e a prevenirne la diffusione. La comunità di NLP ha iniziato a studiare l’uso di architetture neurali per la generazione di CN. Tuttavia, gli sforzi sono stati rivolti esclusivamente all’inglese. In questo lavoro, cerchiamo di colmare la lacuna per l’italiano, mostrando come implementare efficacemente approcci di generazione di CN. Sperimentiamo con un dataset esistente di CN e un modello del linguaggio per l’italiano recentemente rilasciato, in diverse configurazioni, tra cui zero e few shot learning. I risultati mostrano che anche per lingue con poche risorse, strategie di data augmentation abbinate a potenti modelli del linguaggio possono offrire risultati promettenti. Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
英语。反叙事是一种文字回应,旨在抵御网络仇恨并防止其蔓延。NLP社区开始研究使用神经结构生成反叙事(CNs)。不过,这些努力只针对英语。在本文中,我们试图填补意大利语的空白,研究如何有效地实现CN生成方法。我们用一个现有的神经网络数据集和一个新的语言模型(最近发布的意大利语语言模型)在几种配置下进行了实验,包括零和少镜头学习。结果表明,即使对于资源不足的语言,数据增强策略与大型无监督LMs配对也可以获得有希望的结果。意大利语。“控制叙事”是一种对抗“测试电压”的方法,与“在线音频”形成对比,防止“音频”扩散。“NLP社区”是一个由“建筑神经网络”和“建筑神经网络”组成的工作室。Tuttavia, gli sforzi是一个独立的国家,它是一个独立的国家。在有关味觉、味觉、味觉和味觉、味觉和味觉的问题上,最奇怪的是,执行效率的方法是联合国的通用方法。在不同的配置下,通过对数据集的分析,建立了基于意大利语的语言学习模型,实现了零次学习。通过对不同语言之间的语言关系进行分析,数据增强策略为语言关系提供了一种潜在的语言关系模型,并提供了一种新的语言关系模型。本文版权所有©2020。在知识共享许可国际署名4.0 (CC BY 4.0)下允许使用。
{"title":"Italian Counter Narrative Generation to Fight Online Hate Speech","authors":"Yi-Ling Chung, Serra Sinem Tekiroğlu, Marco Guerini","doi":"10.4000/books.aaccademia.8378","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8378","url":null,"abstract":"English. Counter Narratives are textual responses meant to withstand online hatred and prevent its spreading. The use of neural architectures for the generation of Counter Narratives (CNs) is beginning to be investigated by the NLP community. Still, the efforts were solely targeting English. In this paper, we try to fill the gap for Italian, studying how to implement CN generation approaches effectively. We experiment with an existing dataset of CNs and a novel language model, recently released for Italian, under several configurations, including zero and few shot learning. Results show that even for underresourced languages, data augmentation strategies paired with large unsupervised LMs can held promising results. Italiano. Le Contro Narrative sono risposte testuali volte a contrastare l’odio online e a prevenirne la diffusione. La comunità di NLP ha iniziato a studiare l’uso di architetture neurali per la generazione di CN. Tuttavia, gli sforzi sono stati rivolti esclusivamente all’inglese. In questo lavoro, cerchiamo di colmare la lacuna per l’italiano, mostrando come implementare efficacemente approcci di generazione di CN. Sperimentiamo con un dataset esistente di CN e un modello del linguaggio per l’italiano recentemente rilasciato, in diverse configurazioni, tra cui zero e few shot learning. I risultati mostrano che anche per lingue con poche risorse, strategie di data augmentation abbinate a potenti modelli del linguaggio possono offrire risultati promettenti. Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124542933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Analyses of Character Emotions in Dramatic Works by Using EmoLex Unigrams 运用EmoLex图形分析戏剧作品中的人物情感
Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.9004
Mehmet Can Yavuz
In theatrical pieces, written language is the primary medium for establishing antagonisms. As one of the most important figures of renaissance, Shakespeare wrote characters which express themselves clearly. Thus, the emotional landscape of the plays can be revealed from the texts. It is important to analyze such landscapes for further demonstrating these structures. We use word-emotion association lexicon with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). By using this lexicon, the emotional state of each character is represented in 10 dimensional space and mapped onto a plane. This principle axes planes position each character relatively. Additionally, tempora-emotional evaluation of each play is graphed. We conclude that the protagonist and the antagonist have different emotional states from the rest and these two emotionally oppose each other. Temporal-Emotional timeline of the plays are meaningful to have a better insight into the tragedies.
在戏剧作品中,书面语言是建立对抗的主要媒介。作为文艺复兴时期最重要的人物之一,莎士比亚笔下的人物表达清晰。因此,戏剧的情感景观可以从文本中揭示出来。分析这些景观对于进一步展示这些结构是很重要的。我们使用的词-情绪关联词汇有八种基本情绪(愤怒、恐惧、期待、信任、惊讶、悲伤、快乐和厌恶)和两种情绪(消极和积极)。通过使用这个词汇,每个角色的情感状态在10维空间中被表示,并映射到一个平面上。这个原理轴平面相对地定位每个角色。此外,每个剧本的时间-情感评价被绘制成图表。我们得出结论,主人公和反派的情绪状态不同于其他人,两者在情感上是对立的。戏剧的时间-情感时间线对于更好地理解悲剧具有重要意义。
{"title":"Analyses of Character Emotions in Dramatic Works by Using EmoLex Unigrams","authors":"Mehmet Can Yavuz","doi":"10.4000/books.aaccademia.9004","DOIUrl":"https://doi.org/10.4000/books.aaccademia.9004","url":null,"abstract":"In theatrical pieces, written language is the primary medium for establishing antagonisms. As one of the most important figures of renaissance, Shakespeare wrote characters which express themselves clearly. Thus, the emotional landscape of the plays can be revealed from the texts. It is important to analyze such landscapes for further demonstrating these structures. We use word-emotion association lexicon with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). By using this lexicon, the emotional state of each character is represented in 10 dimensional space and mapped onto a plane. This principle axes planes position each character relatively. Additionally, tempora-emotional evaluation of each play is graphed. We conclude that the protagonist and the antagonist have different emotional states from the rest and these two emotionally oppose each other. Temporal-Emotional timeline of the plays are meaningful to have a better insight into the tragedies.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126292994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The "Corpus Anchise 320" and the Analysis of Conversations between Healthcare Workers and People with Dementia “安奇斯320语料库”与医护人员与痴呆症患者对话分析
Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8260
Nicola Benvenuti, Andrea Bolioli, A. Mazzei, Pietro Vigorelli, A. Bosca
The aim of this research was to create the first Italian corpus of free conversations between healthcare workers and people with dementia, in order to investigate specific linguistic phenomena from a computational point of view. Most of the previous researches on speech disorders of people with dementia have been based on qualitative analysis, or on the study of a few dozen cases executed in laboratory conditions, and not in spontaneous speech (in particular for the Italian language). The creation of the Corpus Anchise 320 aims to investigate Dementia language by providing a broader number of dialogues collected in ecological conditions and obtained transcribing spontaneous speech. Moreover, quantitative linguistic analysis can show some peculiarities of this language.
这项研究的目的是创建第一个意大利语语料库,用于医疗工作者和痴呆症患者之间的自由对话,以便从计算的角度研究特定的语言现象。以前对痴呆症患者语言障碍的研究大多是基于定性分析,或在实验室条件下对几十个病例的研究,而不是对自发语言(特别是意大利语)的研究。建立Anchise 320语料库的目的是通过提供在生态条件下收集的更广泛的对话并转录自发语音来研究痴呆症语言。此外,定量的语言分析可以显示出这种语言的一些特点。
{"title":"The \"Corpus Anchise 320\" and the Analysis of Conversations between Healthcare Workers and People with Dementia","authors":"Nicola Benvenuti, Andrea Bolioli, A. Mazzei, Pietro Vigorelli, A. Bosca","doi":"10.4000/books.aaccademia.8260","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8260","url":null,"abstract":"The aim of this research was to create the first Italian corpus of free conversations between healthcare workers and people with dementia, in order to investigate specific linguistic phenomena from a computational point of view. Most of the previous researches on speech disorders of people with dementia have been based on qualitative analysis, or on the study of a few dozen cases executed in laboratory conditions, and not in spontaneous speech (in particular for the Italian language). The creation of the Corpus Anchise 320 aims to investigate Dementia language by providing a broader number of dialogues collected in ecological conditions and obtained transcribing spontaneous speech. Moreover, quantitative linguistic analysis can show some peculiarities of this language.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124240892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
(Stem and Word) Predictability in Italian Verb Paradigms: An Entropy-Based Study Exploiting the New Resource LeFFI 意大利语动词范式的可预测性:基于熵的新资源LeFFI的研究
Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8830
Matteo Pellegrini, A. T. Cignarella
English. In this paper we present LeFFI, an inflected lexicon of Italian listing all the available wordforms of 2,053 verbs. We then use this resource to perform an entropy-based analysis of the mutual predictability of wordforms within Italian verb paradigms, and compare our findings to the ones of previous work on stem predictability in Italian verb inflection.
英语。在本文中,我们提出了一个列有2053个动词的所有可用词形的意大利语屈折词典LeFFI。然后,我们使用该资源对意大利语动词范式中词形的相互可预测性进行了基于熵的分析,并将我们的发现与之前关于意大利语动词屈折词干可预测性的研究结果进行了比较。
{"title":"(Stem and Word) Predictability in Italian Verb Paradigms: An Entropy-Based Study Exploiting the New Resource LeFFI","authors":"Matteo Pellegrini, A. T. Cignarella","doi":"10.4000/books.aaccademia.8830","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8830","url":null,"abstract":"English. In this paper we present LeFFI, an inflected lexicon of Italian listing all the available wordforms of 2,053 verbs. We then use this resource to perform an entropy-based analysis of the mutual predictability of wordforms within Italian verb paradigms, and compare our findings to the ones of previous work on stem predictability in Italian verb inflection.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114855565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1