Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8528
Francesco Fernicola, Shibingfeng Zhang, F. Garcea, P. Bonora, Alberto Barrón-Cedeño
We present a new task: the identification of the emotions transmitted in Italian opera arias at the verse level. This is a relevant problem for the organization of the vast repertoire of Italian Opera arias available and to enable further analyses by both musicologists and the lay public. We shape the task as a multi-class supervised problem, considering six emotions: love, joy, admiration, anger, sadness, and fear. In order to address it, we manually-annotated an opera corpus with 2.5k verses —which we release to the research community— and experimented with different classification models and representations. Our best-performing models reach macroaveraged F1 measures of ∼0.45, always considering character 3-grams representations. Such performance reflects the difficulty of the task at hand, partially caused by the size and nature of the corpus, which consists of relatively short verses written in 18thcentury Italian.
{"title":"AriEmozione: Identifying Emotions in Opera Verses","authors":"Francesco Fernicola, Shibingfeng Zhang, F. Garcea, P. Bonora, Alberto Barrón-Cedeño","doi":"10.4000/books.aaccademia.8528","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8528","url":null,"abstract":"We present a new task: the identification of the emotions transmitted in Italian opera arias at the verse level. This is a relevant problem for the organization of the vast repertoire of Italian Opera arias available and to enable further analyses by both musicologists and the lay public. We shape the task as a multi-class supervised problem, considering six emotions: love, joy, admiration, anger, sadness, and fear. In order to address it, we manually-annotated an opera corpus with 2.5k verses —which we release to the research community— and experimented with different classification models and representations. Our best-performing models reach macroaveraged F1 measures of ∼0.45, always considering character 3-grams representations. Such performance reflects the difficulty of the task at hand, partially caused by the size and nature of the corpus, which consists of relatively short verses written in 18thcentury Italian.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122620525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8475
Elisa Di Nuovo, C. Bosco, E. Corino
In this paper we present a pilot study on human performance for the Native Language Identification task. We performed two tests aimed at exploring the human baseline for the task in which test takers had to identify the writers’ L1 relying only on scripts written in Italian by English, French, German and Spanish native speakers. Then, we conducted an error analysis considering the language background of both test takers and text writers.
{"title":"How Good are Humans at Native Language Identification? A Case Study on Italian L2 writings","authors":"Elisa Di Nuovo, C. Bosco, E. Corino","doi":"10.4000/books.aaccademia.8475","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8475","url":null,"abstract":"In this paper we present a pilot study on human performance for the Native Language Identification task. We performed two tests aimed at exploring the human baseline for the task in which test takers had to identify the writers’ L1 relying only on scripts written in Italian by English, French, German and Spanish native speakers. Then, we conducted an error analysis considering the language background of both test takers and text writers.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125986647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8600
Eleonora Gualdoni, R. Bernardi, R. Fernández, Sandro Pezzelle
We study how language use differs between dialogue partners in a visually grounded reference task when a referent is mutually identifiable by both interlocutors vs. when it is only available to one of them. In the latter case, the addressee needs to disconfirm a proposed description – a skill largely neglected by both the theoretical and the computational linguistics communities. We consider a number of linguistic features that we expect to vary across conditions. We then analyze their effectiveness in distinguishing among the two conditions by means of statistical tests and a feature-based classifier. Overall, we show that language mirrors different grounding conditions, paving the way to future deeper investigation of referential disconfirmation.
{"title":"Grounded and Ungrounded Referring Expressions in Human Dialogues: Language Mirrors Different Grounding Conditions","authors":"Eleonora Gualdoni, R. Bernardi, R. Fernández, Sandro Pezzelle","doi":"10.4000/books.aaccademia.8600","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8600","url":null,"abstract":"We study how language use differs between dialogue partners in a visually grounded reference task when a referent is mutually identifiable by both interlocutors vs. when it is only available to one of them. In the latter case, the addressee needs to disconfirm a proposed description – a skill largely neglected by both the theoretical and the computational linguistics communities. We consider a number of linguistic features that we expect to vary across conditions. We then analyze their effectiveness in distinguishing among the two conditions by means of statistical tests and a feature-based classifier. Overall, we show that language mirrors different grounding conditions, paving the way to future deeper investigation of referential disconfirmation.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126403252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8979
Linda Wiechetek, Chiara Argese, Flammie A. Pirinen, Trond Trosterud
La lessicalizzazione delle parole composte, in aggiunta a trattarle in maniera dinamica, è un elemento chiave per ottenere traduzioni idiomatiche e rilevare errori nelle stesse. Presentiamo e valutiamo un e-dizionario (NDS) e un correttore grammaticale (GramDivvun) per il Sami del Nord. Otteniamo una copertura del 98% per le ricerche in NDS e del 96% per il rilevamento di errori nelle parole composte in GramDivvun.
{"title":"Suoidne-varra-bleahkka-mála-bihkka-senet-dielku 'hay-blood-ink-paint-tar-mustard-stain' -Should compounds be lexicalized in NLP?","authors":"Linda Wiechetek, Chiara Argese, Flammie A. Pirinen, Trond Trosterud","doi":"10.4000/books.aaccademia.8979","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8979","url":null,"abstract":"La lessicalizzazione delle parole composte, in aggiunta a trattarle in maniera dinamica, è un elemento chiave per ottenere traduzioni idiomatiche e rilevare errori nelle stesse. Presentiamo e valutiamo un e-dizionario (NDS) e un correttore grammaticale (GramDivvun) per il Sami del Nord. Otteniamo una copertura del 98% per le ricerche in NDS e del 96% per il rilevamento di errori nelle parole composte in GramDivvun.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126407974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8648
Samuel Louvan, B. Magnini
Data augmentation has shown potential in alleviating data scarcity for Natural Language Understanding (e.g. slot filling and intent classification) in task-oriented dialogue systems. As prior work has been mostly experimented on English datasets, we focus on five different languages, and consider a setting where limited data are available. We investigate the effectiveness of non-gradient based augmentation methods, involving simple text span substitutions and syntactic manipulations. Our experiments show that (i) augmentation is effective in all cases, particularly for slot filling; and (ii) it is beneficial for a joint intent-slot model based on multilingual BERT, both for limited data settings and when full training data is used.
{"title":"Simple Data Augmentation for Multilingual NLU in Task Oriented Dialogue Systems","authors":"Samuel Louvan, B. Magnini","doi":"10.4000/books.aaccademia.8648","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8648","url":null,"abstract":"Data augmentation has shown potential in alleviating data scarcity for Natural Language Understanding (e.g. slot filling and intent classification) in task-oriented dialogue systems. As prior work has been mostly experimented on English datasets, we focus on five different languages, and consider a setting where limited data are available. We investigate the effectiveness of non-gradient based augmentation methods, involving simple text span substitutions and syntactic manipulations. Our experiments show that (i) augmentation is effective in all cases, particularly for slot filling; and (ii) it is beneficial for a joint intent-slot model based on multilingual BERT, both for limited data settings and when full training data is used.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130389302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8860
G. Roccabruna, Alessandra Cervone, G. Riccardi
English. The task of Dialogue Act (DA) tagging, a crucial component in many conversational agents, is often addressed assuming a single DA per speaker turn in the conversation. However, speakers’ turns are often multifunctional, that is they can contain more than one DA (i.e. “I’m Alex. Have we met before?” contains a ‘state-ment’, followed by a ‘question’). This work focuses on multifunctional DA tagging in Italian. First, we present iLIS-TEN2ISO, a novel resource with multi-functional DA annotation in Italian, created by annotating the iLISTEN corpus with the ISO standard. We provide an analysis of the corpus showing the importance of multifunctionality for DA tagging. Additionally, we train DA taggers for Italian on iLISTEN (achieving State of the Art results) and iLISTEN2ISO. Our findings indicate the importance of using a multifunctional approach for DA tagging.
{"title":"Multifunctional ISO Standard Dialogue Act Tagging in Italian","authors":"G. Roccabruna, Alessandra Cervone, G. Riccardi","doi":"10.4000/books.aaccademia.8860","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8860","url":null,"abstract":"English. The task of Dialogue Act (DA) tagging, a crucial component in many conversational agents, is often addressed assuming a single DA per speaker turn in the conversation. However, speakers’ turns are often multifunctional, that is they can contain more than one DA (i.e. “I’m Alex. Have we met before?” contains a ‘state-ment’, followed by a ‘question’). This work focuses on multifunctional DA tagging in Italian. First, we present iLIS-TEN2ISO, a novel resource with multi-functional DA annotation in Italian, created by annotating the iLISTEN corpus with the ISO standard. We provide an analysis of the corpus showing the importance of multifunctionality for DA tagging. Additionally, we train DA taggers for Italian on iLISTEN (achieving State of the Art results) and iLISTEN2ISO. Our findings indicate the importance of using a multifunctional approach for DA tagging.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130731537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8320
Gaia Caligiore, C. Bosco, A. Mazzei
The Italian Sign Language (LIS) is the natural language used by the Italian Deaf community. This paper discusses the application of the Universal Dependencies (UD) format to the syntactic annotation of a LIS corpus. This investigation aims in particular at contributing to sign language research by addressing the challenges that the visual-manual modality of LIS creates generally in linguistic annotation and specifically in segmentation and syntactic analysis. We addressed two case studies from the storytelling domain first segmented on the ELAN platform, and second syntactically annotated using CoNLLU format.
{"title":"Building a Treebank in Universal Dependencies for Italian Sign Language","authors":"Gaia Caligiore, C. Bosco, A. Mazzei","doi":"10.4000/books.aaccademia.8320","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8320","url":null,"abstract":"The Italian Sign Language (LIS) is the natural language used by the Italian Deaf community. This paper discusses the application of the Universal Dependencies (UD) format to the syntactic annotation of a LIS corpus. This investigation aims in particular at contributing to sign language research by addressing the challenges that the visual-manual modality of LIS creates generally in linguistic annotation and specifically in segmentation and syntactic analysis. We addressed two case studies from the storytelling domain first segmented on the ELAN platform, and second syntactically annotated using CoNLLU format.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131145054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.9030
Alessandro Lenci
Distributional semantics is undoubtedly the mainstream approach to meaning representation in computational linguistics today. It has also become an important paradigm of semantic analysis in cognitive science, and even linguists have started looking at it with growing interest. The popularity of distributional semantics has literally boomed in the era of Deep Learning, when “word embeddings” have become the basic ingredient to “cook” any NLP task. The era of BERT & co. has brought new types of contextualized representations that have often generated hasty claims of incredible breakthroughs in the natural language understanding capability of deep learning models. Unfortunately, these claims are not always supported by the improved semantic abilities of the last generation of embeddings. Models like BERT are still rooted in the principles of distributional learning, but at the same time their goal is more ambitious than generating corpus-based representations of meaning. On the one hand, the embeddings they produce encode much more than lexical meaning, but on the other hand we are still largely uncertain about what semantic properties of natural language they actually capture. Distributional semantics has surely benefited from the successes of the deep learning, but this might even jeopardize the very essence of distributional models of meaning, by making their goals and foundations unclear.
{"title":"Distributional Semantics: Yesterday, Today, and Tomorrow","authors":"Alessandro Lenci","doi":"10.4000/books.aaccademia.9030","DOIUrl":"https://doi.org/10.4000/books.aaccademia.9030","url":null,"abstract":"Distributional semantics is undoubtedly the mainstream approach to meaning representation in computational linguistics today. It has also become an important paradigm of semantic analysis in cognitive science, and even linguists have started looking at it with growing interest. The popularity of distributional semantics has literally boomed in the era of Deep Learning, when “word embeddings” have become the basic ingredient to “cook” any NLP task. The era of BERT & co. has brought new types of contextualized representations that have often generated hasty claims of incredible breakthroughs in the natural language understanding capability of deep learning models. Unfortunately, these claims are not always supported by the improved semantic abilities of the last generation of embeddings. Models like BERT are still rooted in the principles of distributional learning, but at the same time their goal is more ambitious than generating corpus-based representations of meaning. On the one hand, the embeddings they produce encode much more than lexical meaning, but on the other hand we are still largely uncertain about what semantic properties of natural language they actually capture. Distributional semantics has surely benefited from the successes of the deep learning, but this might even jeopardize the very essence of distributional models of meaning, by making their goals and foundations unclear.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134520385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}