Pub Date : 2020-10-01DOI: 10.4000/books.aaccademia.8844
Argentina Anna Rescigno, Eva Vanmassenhove, J. Monti, Andy Way
This paper presents the results of an evaluation of Google Translate, DeepL and Bing Microsoft Translator with reference to natural gender translation and provides statistics about the frequency of female, male and neutral forms in the translations of a list of personality adjectives, and nouns referring to professions and bigender nouns. The evaluation is carried out for English→Spanish, English→Italian and English→French.
{"title":"A Case Study of Natural Gender Phenomena in Translation. A Comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish","authors":"Argentina Anna Rescigno, Eva Vanmassenhove, J. Monti, Andy Way","doi":"10.4000/books.aaccademia.8844","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8844","url":null,"abstract":"This paper presents the results of an evaluation of Google Translate, DeepL and Bing Microsoft Translator with reference to natural gender translation and provides statistics about the frequency of female, male and neutral forms in the translations of a list of personality adjectives, and nouns referring to professions and bigender nouns. The evaluation is carried out for English→Spanish, English→Italian and English→French.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122762140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.4000/books.aaccademia.8628
A. Lim, B. O’Brien, Luca Onnis
It is widely held that children implicitly learn the structure of their writing system through statistical learning of spelling-tosound mappings. Yet an unresolved question is how to sequence reading experience so that children can ‘pick up’ the structure optimally. We tackle this question here using a computational model of encoding and decoding. The order of presentation of words was manipulated so that they exhibited two distinct progressions of granularity of spelling-to-sound mappings. We found that under a training regime that introduced written words progressively from small-to-large granularity, the network exhibited an early advantage in reading acquisition as compared to a regime introducing written words from large-to-small granularity. Our results thus provide support for the grain size theory (Ziegler and Goswami, 2005) and demonstrate that the order of learning can influence learning trajectories of literacy skills.
{"title":"How Granularity of Orthography-Phonology Mappings Affect Reading Development: Evidence from a Computational Model of English Word Reading and Spelling","authors":"A. Lim, B. O’Brien, Luca Onnis","doi":"10.4000/books.aaccademia.8628","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8628","url":null,"abstract":"It is widely held that children implicitly learn the structure of their writing system through statistical learning of spelling-tosound mappings. Yet an unresolved question is how to sequence reading experience so that children can ‘pick up’ the structure optimally. We tackle this question here using a computational model of encoding and decoding. The order of presentation of words was manipulated so that they exhibited two distinct progressions of granularity of spelling-to-sound mappings. We found that under a training regime that introduced written words progressively from small-to-large granularity, the network exhibited an early advantage in reading acquisition as compared to a regime introducing written words from large-to-small granularity. Our results thus provide support for the grain size theory (Ziegler and Goswami, 2005) and demonstrate that the order of learning can influence learning trajectories of literacy skills.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124370545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8250
Valerio Basile
Pre-trained word embeddings are often used to initialize deep learning models for text classification, as a way to inject precomputed lexical knowledge and boost the learning process. However, such embeddings are usually trained on generic corpora, while text classification tasks are often domain-specific. We propose a fully automated method to adapt pre-trained word embeddings to any given classification task, that needs no additional resource other than the original training set. The method is based on the concept of word weirdness, extended to score the words in the training set according to how characteristic they are with respect to the labels of a text classification dataset. The polarized weirdness scores are then used to update the word embeddings to reflect taskspecific semantic shifts. Our experiments show that this method is beneficial to the performance of several text classification tasks in different languages.
{"title":"Domain Adaptation for Text Classification with Weird Embeddings","authors":"Valerio Basile","doi":"10.4000/books.aaccademia.8250","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8250","url":null,"abstract":"Pre-trained word embeddings are often used to initialize deep learning models for text classification, as a way to inject precomputed lexical knowledge and boost the learning process. However, such embeddings are usually trained on generic corpora, while text classification tasks are often domain-specific. We propose a fully automated method to adapt pre-trained word embeddings to any given classification task, that needs no additional resource other than the original training set. The method is based on the concept of word weirdness, extended to score the words in the training set according to how characteristic they are with respect to the labels of a text classification dataset. The polarized weirdness scores are then used to update the word embeddings to reflect taskspecific semantic shifts. Our experiments show that this method is beneficial to the performance of several text classification tasks in different languages.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117204780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8915
Irene Sucameli, Alessandro Lenci, B. Magnini, M. Simi, Manuela Speranza
English. The difficulty in finding useful dialogic data to train a conversational agent is an open issue even nowadays, when chatbots and spoken dialogue systems are widely used. For this reason we decided to build JILDA, a novel data collection of chat-based dialogues, produced by Italian native speakers and related to the job-offer domain. JILDA is the first dialogue collection related to this domain for the Italian language. Because of its collection modalities, we believe that JILDA can be a useful resource not only for the Italian research community, but also for the international one. Italiano. Negli ultimi anni l’utilizzo di chatbot e sistemi dialogici è diventato sempre più comune; tuttavia, il reperimento di dati di apprendimento adeguati per addestrare agenti conversazionali costituisce ancora una questione irrisolta. Per questo motivo abbiamo deciso di produrre JILDA, un nuovo dataset di dialoghi relativi al dominio della ricerca del lavoro e realizzati via chat da parlanti nativi italiani. JILDA costituisce la prima collezione di dialoghi relativi a questo dominio, in lingua italiana. Per gli aspetti metodologici e la modalità di raccolta dei dati, riteniamo che una simile risorsa possa essere utile ed interessante non solo per la comunità di ricerca italiana ma anche per quella internazionale.
英语。即使在聊天机器人和口语对话系统被广泛使用的今天,寻找有用的对话数据来训练会话代理的困难也是一个悬而未决的问题。出于这个原因,我们决定建立JILDA,这是一个基于聊天对话的新颖数据集,由意大利语母语人士制作,与工作招聘领域相关。JILDA是第一个与此领域相关的意大利语对话集。由于其收集方式,我们相信JILDA不仅可以成为意大利研究界的有用资源,而且可以成为国际研究界的有用资源。意大利语。最后,我想说的是,我将利用聊天的方式与系统对话è进行交流più交流;土耳其,将要求在对话构成和任何问题上都有足够的能力,并要求在对话构成和任何问题上都有足够的能力。根据JILDA的提问动机(question to motivo abbiamo decisiiso di produrre),新的数据集(dataset)和对话(relative dominio)、数据集(ricerca)和数据集(lavori)通过对话(chat da parlanti native italani)得以实现。JILDA以意大利语组成了一个主要的学院,就有关问题进行对话。从方法论的角度看,从数据的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从方法的角度看,从社会的角度看,从方法的角度看,意大利的角度看,国际上的角度看,都是非常有用的。
{"title":"Becoming JILDA","authors":"Irene Sucameli, Alessandro Lenci, B. Magnini, M. Simi, Manuela Speranza","doi":"10.4000/books.aaccademia.8915","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8915","url":null,"abstract":"English. The difficulty in finding useful dialogic data to train a conversational agent is an open issue even nowadays, when chatbots and spoken dialogue systems are widely used. For this reason we decided to build JILDA, a novel data collection of chat-based dialogues, produced by Italian native speakers and related to the job-offer domain. JILDA is the first dialogue collection related to this domain for the Italian language. Because of its collection modalities, we believe that JILDA can be a useful resource not only for the Italian research community, but also for the international one. Italiano. Negli ultimi anni l’utilizzo di chatbot e sistemi dialogici è diventato sempre più comune; tuttavia, il reperimento di dati di apprendimento adeguati per addestrare agenti conversazionali costituisce ancora una questione irrisolta. Per questo motivo abbiamo deciso di produrre JILDA, un nuovo dataset di dialoghi relativi al dominio della ricerca del lavoro e realizzati via chat da parlanti nativi italiani. JILDA costituisce la prima collezione di dialoghi relativi a questo dominio, in lingua italiana. Per gli aspetti metodologici e la modalità di raccolta dei dati, riteniamo che una simile risorsa possa essere utile ed interessante non solo per la comunità di ricerca italiana ma anche per quella internazionale.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122178507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.9004
Mehmet Can Yavuz
In theatrical pieces, written language is the primary medium for establishing antagonisms. As one of the most important figures of renaissance, Shakespeare wrote characters which express themselves clearly. Thus, the emotional landscape of the plays can be revealed from the texts. It is important to analyze such landscapes for further demonstrating these structures. We use word-emotion association lexicon with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). By using this lexicon, the emotional state of each character is represented in 10 dimensional space and mapped onto a plane. This principle axes planes position each character relatively. Additionally, tempora-emotional evaluation of each play is graphed. We conclude that the protagonist and the antagonist have different emotional states from the rest and these two emotionally oppose each other. Temporal-Emotional timeline of the plays are meaningful to have a better insight into the tragedies.
{"title":"Analyses of Character Emotions in Dramatic Works by Using EmoLex Unigrams","authors":"Mehmet Can Yavuz","doi":"10.4000/books.aaccademia.9004","DOIUrl":"https://doi.org/10.4000/books.aaccademia.9004","url":null,"abstract":"In theatrical pieces, written language is the primary medium for establishing antagonisms. As one of the most important figures of renaissance, Shakespeare wrote characters which express themselves clearly. Thus, the emotional landscape of the plays can be revealed from the texts. It is important to analyze such landscapes for further demonstrating these structures. We use word-emotion association lexicon with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). By using this lexicon, the emotional state of each character is represented in 10 dimensional space and mapped onto a plane. This principle axes planes position each character relatively. Additionally, tempora-emotional evaluation of each play is graphed. We conclude that the protagonist and the antagonist have different emotional states from the rest and these two emotionally oppose each other. Temporal-Emotional timeline of the plays are meaningful to have a better insight into the tragedies.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126292994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8260
Nicola Benvenuti, Andrea Bolioli, A. Mazzei, Pietro Vigorelli, A. Bosca
The aim of this research was to create the first Italian corpus of free conversations between healthcare workers and people with dementia, in order to investigate specific linguistic phenomena from a computational point of view. Most of the previous researches on speech disorders of people with dementia have been based on qualitative analysis, or on the study of a few dozen cases executed in laboratory conditions, and not in spontaneous speech (in particular for the Italian language). The creation of the Corpus Anchise 320 aims to investigate Dementia language by providing a broader number of dialogues collected in ecological conditions and obtained transcribing spontaneous speech. Moreover, quantitative linguistic analysis can show some peculiarities of this language.
{"title":"The \"Corpus Anchise 320\" and the Analysis of Conversations between Healthcare Workers and People with Dementia","authors":"Nicola Benvenuti, Andrea Bolioli, A. Mazzei, Pietro Vigorelli, A. Bosca","doi":"10.4000/books.aaccademia.8260","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8260","url":null,"abstract":"The aim of this research was to create the first Italian corpus of free conversations between healthcare workers and people with dementia, in order to investigate specific linguistic phenomena from a computational point of view. Most of the previous researches on speech disorders of people with dementia have been based on qualitative analysis, or on the study of a few dozen cases executed in laboratory conditions, and not in spontaneous speech (in particular for the Italian language). The creation of the Corpus Anchise 320 aims to investigate Dementia language by providing a broader number of dialogues collected in ecological conditions and obtained transcribing spontaneous speech. Moreover, quantitative linguistic analysis can show some peculiarities of this language.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124240892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8830
Matteo Pellegrini, A. T. Cignarella
English. In this paper we present LeFFI, an inflected lexicon of Italian listing all the available wordforms of 2,053 verbs. We then use this resource to perform an entropy-based analysis of the mutual predictability of wordforms within Italian verb paradigms, and compare our findings to the ones of previous work on stem predictability in Italian verb inflection.
{"title":"(Stem and Word) Predictability in Italian Verb Paradigms: An Entropy-Based Study Exploiting the New Resource LeFFI","authors":"Matteo Pellegrini, A. T. Cignarella","doi":"10.4000/books.aaccademia.8830","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8830","url":null,"abstract":"English. In this paper we present LeFFI, an inflected lexicon of Italian listing all the available wordforms of 2,053 verbs. We then use this resource to perform an entropy-based analysis of the mutual predictability of wordforms within Italian verb paradigms, and compare our findings to the ones of previous work on stem predictability in Italian verb inflection.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114855565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}