Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8920
F. Tamburini
The use of contextualised word embeddings allowed for a relevant performance increase for almost all Natural Language Processing (NLP) applications. Recently some new models especially developed for Italian became available to scholars. This work aims at evaluating the impact of these models in enhancing application performance for Italian establishing the new state-of-the-art for some fundamental NLP tasks.
{"title":"How \"BERTology\" Changed the State-of-the-Art also for Italian NLP","authors":"F. Tamburini","doi":"10.4000/books.aaccademia.8920","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8920","url":null,"abstract":"The use of contextualised word embeddings allowed for a relevant performance increase for almost all Natural Language Processing (NLP) applications. Recently some new models especially developed for Italian became available to scholars. This work aims at evaluating the impact of these models in enhancing application performance for Italian establishing the new state-of-the-art for some fundamental NLP tasks.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128966083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8910
R. Sprugnoli
English. This paper1 presents a new linguistic resource for Italian, called MultiEmotions-It, containing comments to music videos and advertisements posted on YouTube and Facebook. These comments are manually annotated according to four different dimensions: i.e., relatedness, opinion polarity, emotions and sarcasm. For the annotation of emotions we adopted the Plutchik’s model taking into account both basic and complex emotions, i.e. dyads.
{"title":"MultiEmotions-It: a New Dataset for Opinion Polarity and Emotion Analysis for Italian","authors":"R. Sprugnoli","doi":"10.4000/books.aaccademia.8910","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8910","url":null,"abstract":"English. This paper1 presents a new linguistic resource for Italian, called MultiEmotions-It, containing comments to music videos and advertisements posted on YouTube and Facebook. These comments are manually annotated according to four different dimensions: i.e., relatedness, opinion polarity, emotions and sarcasm. For the annotation of emotions we adopted the Plutchik’s model taking into account both basic and complex emotions, i.e. dyads.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122087617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8565
G. Franzini, Federica Zampedri, M. Passarotti, Francesco Mambrini, Giovanni Moretti
English. This paper describes the addition of an index of 1, 763 Ancient Greek loanwords to the collection of Latin lemmas of the LiLa: Linking Latin Knowledge Base of interoperable linguistic resources. This lexical resource increases LiLa’s lemma count and tunes its underlying data model to etymological borrowing.
{"title":"Græcissare: Ancient Greek Loanwords in the LiLa Knowledge Base of Linguistic Resources for Latin","authors":"G. Franzini, Federica Zampedri, M. Passarotti, Francesco Mambrini, Giovanni Moretti","doi":"10.4000/books.aaccademia.8565","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8565","url":null,"abstract":"English. This paper describes the addition of an index of 1, 763 Ancient Greek loanwords to the collection of Latin lemmas of the LiLa: Linking Latin Knowledge Base of interoperable linguistic resources. This lexical resource increases LiLa’s lemma count and tunes its underlying data model to etymological borrowing.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132229940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8964
Marco Vassallo, G. Gabrieli, Valerio Basile, C. Bosco
Polarity imbalance is an asymmetric situation that occurs while using parametric threshold values in lexicon-based Sentiment-Analysis (SA). The variation across the thresholds may have an opposite impact on the prediction of negative and positive polarity. We hypothesize that this may be due to asymmetries in the data or in the lexicon, or both. We carry out therefore experiments for evaluating the effect of lexicon and of the topics addressed in the data. Our experiments are based on a weighted version of the Italian linguistic resource MAL (Morphologicallyinflected Affective Lexicon) by using as weighting corpus TWITA, a large-scale corpus of messages from Twitter in Italian. The novel Weighted-MAL (W-MAL), presented for the first time int this paper, achieved better polarity classification results especially for negative tweets, along with alleviating the aforementioned polarity imbalance. Italiano. Lo sbilanciamento della polarità è una situazione di asimmetria che si viene a creare quando si impiegano valori soglia parametrici nella Sentiment Analysis (SA) basata su dizionario. La variazione dei valori soglia può avere un impatto opposto rispetto alla predizione di polarità negativa e positiva. Si ipotizza che questo effetto sia dovuto ad asimmetrie nei dati o nel dizionario, o in entrambi. Abbiamo condotto esperimenti per misurare l’effetto del lessico e degli argomenti trattati nel nostro dataset. I nostri esperimenti sono basati su una versione ponderata della risorsa per l’italiano MAL (Morphologically-inflected Affective Lexicon), usando come corpus per la ponderazione TWITA, un corpus di larga scala di messaggi da Twitter in italiano. La nuova risorsa Weighted-MAL (W-MAL), presentata per la prima volta in questo articolo, ottiene migliori risultati nella classificazione della polarità specialmente, per i messaggi negativi, oltre ad alleviare il problema sopracitato di sbilanciamento
极性不平衡是在基于词典的情感分析(SA)中使用参数阈值时发生的不对称情况。跨阈值的变化可能对负极性和正极性的预测产生相反的影响。我们假设这可能是由于数据或词汇的不对称,或两者兼而有之。因此,我们进行了实验来评估词汇和数据中所涉及的主题的效果。我们的实验基于意大利语语言资源MAL (morphologallyinflected Affective Lexicon)的加权版本,使用TWITA作为加权语料库,TWITA是一个来自意大利语Twitter的大规模消息语料库。本文首次提出了一种新颖的加权最小二代(Weighted-MAL, W-MAL)方法,该方法取得了更好的极性分类结果,特别是对负面推文,同时缓解了上述极性不平衡。意大利语。从数据分析的角度分析,从数据分析的角度分析,从数据分析的角度分析,从数据分析的角度分析,从数据分析的角度分析。La variazione dei valori soglia può平均不影响访问,所有的predizione dipolititonnegative和positive。这一问题的关键在于,如何有效地解决这些问题,以及如何有效地解决这些问题。Abbiamo对实验结果进行了分析,并对数据集的数据参数进行了分析。I nostri esperienti sono basati su one version ponderata della risorsa per l 'italiano MAL(形态变形情感词典),usando come corpus per la ponderazione TWITA, un corpus di larga scala di messaggi da Twitter in意大利语。新的研究结果加权- mal (W-MAL),提出了一种新的研究方法,即从一开始就研究问题,从一开始就研究问题,从一开始就研究问题,从一开始就研究问题,从一开始就研究问题
{"title":"Polarity Imbalance in Lexicon-based Sentiment Analysis","authors":"Marco Vassallo, G. Gabrieli, Valerio Basile, C. Bosco","doi":"10.4000/books.aaccademia.8964","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8964","url":null,"abstract":"Polarity imbalance is an asymmetric situation that occurs while using parametric threshold values in lexicon-based Sentiment-Analysis (SA). The variation across the thresholds may have an opposite impact on the prediction of negative and positive polarity. We hypothesize that this may be due to asymmetries in the data or in the lexicon, or both. We carry out therefore experiments for evaluating the effect of lexicon and of the topics addressed in the data. Our experiments are based on a weighted version of the Italian linguistic resource MAL (Morphologicallyinflected Affective Lexicon) by using as weighting corpus TWITA, a large-scale corpus of messages from Twitter in Italian. The novel Weighted-MAL (W-MAL), presented for the first time int this paper, achieved better polarity classification results especially for negative tweets, along with alleviating the aforementioned polarity imbalance. Italiano. Lo sbilanciamento della polarità è una situazione di asimmetria che si viene a creare quando si impiegano valori soglia parametrici nella Sentiment Analysis (SA) basata su dizionario. La variazione dei valori soglia può avere un impatto opposto rispetto alla predizione di polarità negativa e positiva. Si ipotizza che questo effetto sia dovuto ad asimmetrie nei dati o nel dizionario, o in entrambi. Abbiamo condotto esperimenti per misurare l’effetto del lessico e degli argomenti trattati nel nostro dataset. I nostri esperimenti sono basati su una versione ponderata della risorsa per l’italiano MAL (Morphologically-inflected Affective Lexicon), usando come corpus per la ponderazione TWITA, un corpus di larga scala di messaggi da Twitter in italiano. La nuova risorsa Weighted-MAL (W-MAL), presentata per la prima volta in questo articolo, ottiene migliori risultati nella classificazione della polarità specialmente, per i messaggi negativi, oltre ad alleviare il problema sopracitato di sbilanciamento","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120972448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8870
Emma Romani, Elisabetta Jezek
In this paper we address the main issues and results of a research thesis (Romani, 2020) dedicated to the annotation of metonymies in T-PAS, a corpus-based digital repository of Italian verbal patterns (Ježek et al., 2014). The annotation was performed on the corpus instances of a selected list of 30 verbs and was aimed at both implementing the resource with metonymic patterns and identifying and creating a map of the metonymic relations that occur in the verbal patterns. The annotated corpus data (consisting of 1218 corpus instances), the patterns, and the relations can be useful for NLP tasks such as metonymy recognition.
{"title":"Tracing Metonymic Relations in T-PAS: An Annotation Exercise on a Corpus-based Resource for Italian","authors":"Emma Romani, Elisabetta Jezek","doi":"10.4000/books.aaccademia.8870","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8870","url":null,"abstract":"In this paper we address the main issues and results of a research thesis (Romani, 2020) dedicated to the annotation of metonymies in T-PAS, a corpus-based digital repository of Italian verbal patterns (Ježek et al., 2014). The annotation was performed on the corpus instances of a selected list of 30 verbs and was aimed at both implementing the resource with metonymic patterns and identifying and creating a map of the metonymic relations that occur in the verbal patterns. The annotated corpus data (consisting of 1218 corpus instances), the patterns, and the relations can be useful for NLP tasks such as metonymy recognition.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115618589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8243
Vevake Balaraman, B. Magnini
Proactivity (i.e., the capacity to provide useful information even when not explicitly required) is a fundamental characteristic of human dialogues. Although current task-oriented dialogue systems are good at providing information explicitly requested by the user, they are poor in exhibiting proactivity, which is typical in humanhuman interactions. In this study, we investigate the presence of proactive behaviours in several available dialogue collections, both human-human and humanmachine and show how the data acquisition decision affects the proactive behaviour present in the dataset. We adopt a two-step approach to semi-automatically detect proactive situations in the datasets, where proactivity is not annotated, and show that the dialogues collected with approaches that provide more freedom to the agent/user, exhibit high proactivity.
{"title":"Investigating Proactivity in Task-Oriented Dialogues","authors":"Vevake Balaraman, B. Magnini","doi":"10.4000/books.aaccademia.8243","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8243","url":null,"abstract":"Proactivity (i.e., the capacity to provide useful information even when not explicitly required) is a fundamental characteristic of human dialogues. Although current task-oriented dialogue systems are good at providing information explicitly requested by the user, they are poor in exhibiting proactivity, which is typical in humanhuman interactions. In this study, we investigate the presence of proactive behaviours in several available dialogue collections, both human-human and humanmachine and show how the data acquisition decision affects the proactive behaviour present in the dataset. We adopt a two-step approach to semi-automatically detect proactive situations in the datasets, where proactivity is not annotated, and show that the dialogues collected with approaches that provide more freedom to the agent/user, exhibit high proactivity.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117119189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8945
A. Uva, Pierluigi Roberti, Alessandro Moschitti
Modern personal assistants require to access unstructured information in order to successfully fulfill user requests. In this paper, we have studied the use of two machine learning components to design personal assistants: intent classification, to understand the user request, and answer sentence selection, to carry out question answering from unstructured text. The evaluation results derived on five different real-world datasets, associated with different companies, show high accuracy for both tasks. This suggests that modern QA and dialog technology is effective for real-world tasks. I moderni personal assistant richiedono di accedere ad informazioni non strutturate per soddisfare con successo le richieste degli utenti. In questo articolo, abbiamo studiato l’uso dell’ apprendimento automatico per progettare due componenti di un personal assistant: classificazione degli intenti, per comprendere la richiesta dell’utente, e la selezione della frase di risposta per rispondere alle domande con testo non strutturato. I risultati della valutazione derivati da cinque diversi datasets del mondo reale, associati a diverse società, mostrano un’elevata precisione per entrambi i modelli. Ciò suggerisce che la moderna tecnologia di question answering e dialogo è efficace per attività reali.
现代个人助理需要访问非结构化信息才能成功地满足用户的请求。在本文中,我们研究了使用两个机器学习组件来设计个人助理:意图分类,理解用户请求,以及回答句子选择,从非结构化文本中进行问题回答。评估结果来源于五个不同的真实世界数据集,这些数据集与不同的公司有关,对这两个任务都显示出很高的准确性。这表明现代QA和对话技术对于现实世界的任务是有效的。在现代个人助理中,富人和非结构化的信息服务成功地取代了富人和非结构化的人。就文章而言,abbiamo studioto ' uso dell '学徒to automatiatiper program .由于组件和个人助理:classificazione degli inti,由于综合和丰富的dell ' utente,由于选择和丰富的dell ' utente,由于结构化和非结构化的domain contesto。在不同的数据集和不同的社会背景下,每个模型的精度都是不同的。Ciò现代技术咨询和问答对话è效率/活动/现实。
{"title":"Dialog-based Help Desk through Automated Question Answering and Intent Detection","authors":"A. Uva, Pierluigi Roberti, Alessandro Moschitti","doi":"10.4000/books.aaccademia.8945","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8945","url":null,"abstract":"Modern personal assistants require to access unstructured information in order to successfully fulfill user requests. In this paper, we have studied the use of two machine learning components to design personal assistants: intent classification, to understand the user request, and answer sentence selection, to carry out question answering from unstructured text. The evaluation results derived on five different real-world datasets, associated with different companies, show high accuracy for both tasks. This suggests that modern QA and dialog technology is effective for real-world tasks. I moderni personal assistant richiedono di accedere ad informazioni non strutturate per soddisfare con successo le richieste degli utenti. In questo articolo, abbiamo studiato l’uso dell’ apprendimento automatico per progettare due componenti di un personal assistant: classificazione degli intenti, per comprendere la richiesta dell’utente, e la selezione della frase di risposta per rispondere alle domande con testo non strutturato. I risultati della valutazione derivati da cinque diversi datasets del mondo reale, associati a diverse società, mostrano un’elevata precisione per entrambi i modelli. Ciò suggerisce che la moderna tecnologia di question answering e dialogo è efficace per attività reali.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115308075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8463
Luca Di Liello, Daniele Bonadiman, Alessandro Moschitti, Cristina Giannone, A. Favalli, Raniero Romagnoli
Transfer learning has been proven to be effective, especially when data for the target domain/task is scarce. Sometimes data for a similar task is only available in another language because it may be very specific. In this paper, we explore the use of machine-translated data to transfer models on a related domain. Specifically, we transfer models from the question duplication task (QDT) to similar FAQ selection tasks. The source domain is the wellknown English Quora dataset, while the target domain is a collection of small Italian datasets for real case scenarios consisting of FAQ groups retrieved by pivoting on common answers. Our results show great improvements in the zero-shot learning setting and modest improvements using the standard transfer approach for direct in-domain adaptation 1.
{"title":"Cross-Language Transformer Adaptation for Frequently Asked Questions","authors":"Luca Di Liello, Daniele Bonadiman, Alessandro Moschitti, Cristina Giannone, A. Favalli, Raniero Romagnoli","doi":"10.4000/books.aaccademia.8463","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8463","url":null,"abstract":"Transfer learning has been proven to be effective, especially when data for the target domain/task is scarce. Sometimes data for a similar task is only available in another language because it may be very specific. In this paper, we explore the use of machine-translated data to transfer models on a related domain. Specifically, we transfer models from the question duplication task (QDT) to similar FAQ selection tasks. The source domain is the wellknown English Quora dataset, while the target domain is a collection of small Italian datasets for real case scenarios consisting of FAQ groups retrieved by pivoting on common answers. Our results show great improvements in the zero-shot learning setting and modest improvements using the standard transfer approach for direct in-domain adaptation 1.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116024253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8745
Alessio Miaschi, Gabriele Sarti, D. Brunato, F. Dell’Orletta, Giulia Venturi
In this paper we present an in-depth investigation of the linguistic knowledge encoded by the transformer models currently available for the Italian language. In particular, we investigate whether and how using different architectures of probing models affects the performance of Italian transformers in encoding a wide spectrum of linguistic features. Moreover, we explore how this implicit knowledge varies according to different textual genres.
{"title":"Italian Transformers Under the Linguistic Lens","authors":"Alessio Miaschi, Gabriele Sarti, D. Brunato, F. Dell’Orletta, Giulia Venturi","doi":"10.4000/books.aaccademia.8745","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8745","url":null,"abstract":"In this paper we present an in-depth investigation of the linguistic knowledge encoded by the transformer models currently available for the Italian language. In particular, we investigate whether and how using different architectures of probing models affects the performance of Italian transformers in encoding a wide spectrum of linguistic features. Moreover, we explore how this implicit knowledge varies according to different textual genres.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128139027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}