Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8450
Mirko Di Lascio, M. Sanguinetti, Luca Anselma, Dario Mana, A. Mazzei, V. Patti, R. Simeoni
English. In this paper we discuss the role of natural language generation (NLG) in modern dialogue systems (DSs). In particular, we will study the role that a linguistically sound NLG architecture can have in a DS. Using real examples from a new corpus of dialogue in customer-care domain, we will study how the non-linguistic contextual data can be exploited by using NLG.
{"title":"Natural Language Generation in Dialogue Systems for Customer Care","authors":"Mirko Di Lascio, M. Sanguinetti, Luca Anselma, Dario Mana, A. Mazzei, V. Patti, R. Simeoni","doi":"10.4000/books.aaccademia.8450","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8450","url":null,"abstract":"English. In this paper we discuss the role of natural language generation (NLG) in modern dialogue systems (DSs). In particular, we will study the role that a linguistically sound NLG architecture can have in a DS. Using real examples from a new corpus of dialogue in customer-care domain, we will study how the non-linguistic contextual data can be exploited by using NLG.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128573140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8353
Greta Gandolfi, C. Strapparava
Ostracism is a community-level phenomenon, shared by most social animals, including humans. Its detection plays a crucial role for the individual, with possible evolutionary consequences for the species. Considering (1) its bound with communication and (2) its social nature, we hypothesise the combination of (a) linguistic and (b) community-level features to have a positive impact on the automatic recognition of ostracism in human online communities. We model an English linguistic community through Reddit data and we analyse the performance of simple classification algorithms. We show how models based on the combination of (a) and (b) generally outperform the same architectures when fed by (a) or (b) in isolation.1
{"title":"Predicting Social Exclusion: A Study of Linguistic Ostracism in Social Networks","authors":"Greta Gandolfi, C. Strapparava","doi":"10.4000/books.aaccademia.8353","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8353","url":null,"abstract":"Ostracism is a community-level phenomenon, shared by most social animals, including humans. Its detection plays a crucial role for the individual, with possible evolutionary consequences for the species. Considering (1) its bound with communication and (2) its social nature, we hypothesise the combination of (a) linguistic and (b) community-level features to have a positive impact on the automatic recognition of ostracism in human online communities. We model an English linguistic community through Reddit data and we analyse the performance of simple classification algorithms. We show how models based on the combination of (a) and (b) generally outperform the same architectures when fed by (a) or (b) in isolation.1","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122323708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8675
Raffaele Manna, A. Pascucci, Wanda Punzi Zarino, Vincenzo Simoniello, J. Monti
This paper presents the results of research carried out on the UNIOR Eye corpus, a corpus which has been built by down-loading tweets related to environmental crimes. The corpus is made up of 228,412 tweets organized into four different sub-sections, each one concerning a specific environmental crime. For the current study we focused on the subsection of waste crimes, composed of 86,206 tweets which were tagged according to the two labels alert and no alert . The aim is to build a model able to detect which class a tweet belongs to.
{"title":"Monitoring Social Media to Identify Environmental Crimes through NLP. A preliminary study","authors":"Raffaele Manna, A. Pascucci, Wanda Punzi Zarino, Vincenzo Simoniello, J. Monti","doi":"10.4000/books.aaccademia.8675","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8675","url":null,"abstract":"This paper presents the results of research carried out on the UNIOR Eye corpus, a corpus which has been built by down-loading tweets related to environmental crimes. The corpus is made up of 228,412 tweets organized into four different sub-sections, each one concerning a specific environmental crime. For the current study we focused on the subsection of waste crimes, composed of 86,206 tweets which were tagged according to the two labels alert and no alert . The aim is to build a model able to detect which class a tweet belongs to.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130899493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8403
Ilaria Colucci, Elisabetta Jezek, V. Baisa
As highlighted by Pustejovsky (1995, 2002), the semantics of each verb is determined by the totality of its complementation patterns. Arguments play in fact a fundamental role in verb meaning and verbal polysemy, thanks to the sense co-composition principle between verb and argument. For this reason, clustering of lexical items filling the Object slot of a verb is believed to bring to surface relevant information about verbal meaning and the verb-Objects relation. The paper presents the results of an experiment comparing the automatic clustering of direct Objects operated by the agglomerative hierarchical algorithm of the Sketch Engine corpus tool with the manual clustering of direct Objects carried out in the T-PAS resource. Cluster analysis is here used to improve the semantic quality of automatic clusters against expert human intuition and as an investigation tool of phenomena intrinsic to semantic selection of verbs and the construction of verb senses in context.
{"title":"Clustering verbal Objects: Manual and Automatic Procedures Compared","authors":"Ilaria Colucci, Elisabetta Jezek, V. Baisa","doi":"10.4000/books.aaccademia.8403","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8403","url":null,"abstract":"As highlighted by Pustejovsky (1995, 2002), the semantics of each verb is determined by the totality of its complementation patterns. Arguments play in fact a fundamental role in verb meaning and verbal polysemy, thanks to the sense co-composition principle between verb and argument. For this reason, clustering of lexical items filling the Object slot of a verb is believed to bring to surface relevant information about verbal meaning and the verb-Objects relation. The paper presents the results of an experiment comparing the automatic clustering of direct Objects operated by the agglomerative hierarchical algorithm of the Sketch Engine corpus tool with the manual clustering of direct Objects carried out in the T-PAS resource. Cluster analysis is here used to improve the semantic quality of automatic clusters against expert human intuition and as an investigation tool of phenomena intrinsic to semantic selection of verbs and the construction of verb senses in context.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"91 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131194746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8280
Davide Biasion, Alessandro Fabris, Gianmaria Silvello, Gian Antonio Susto
In this work we study gender bias in Italian word embeddings (WEs), evaluating whether they encode gender stereotypes studied in social psychology or present in the labor market. We find strong associations with gender in job-related WEs. Weaker gender stereotypes are present in other domains where grammatical gender plays a significant role.
{"title":"Gender Bias in Italian Word Embeddings","authors":"Davide Biasion, Alessandro Fabris, Gianmaria Silvello, Gian Antonio Susto","doi":"10.4000/books.aaccademia.8280","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8280","url":null,"abstract":"In this work we study gender bias in Italian word embeddings (WEs), evaluating whether they encode gender stereotypes studied in social psychology or present in the labor market. We find strong associations with gender in job-related WEs. Weaker gender stereotypes are present in other domains where grammatical gender plays a significant role.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128965824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8728
Enrico Mensa, G. Marino, Davide Colla, Matteo Delsanto, Daniele P. Radicioni
English. In this paper we propose a method for collecting a dictionary to deal with noisy medical text documents. The quality of such Italian Emergency Room Reports is so poor that in most cases these can be hardly automatically elaborated; this also holds for other languages (e.g., English), with the notable difference that no Italian dictionary has been proposed to deal with this jargon. In this work we introduce and evaluate a resource designed to fill this gap.1 Italiano. In questo lavoro illustriamo un metodo per la costruzione di un dizionario dedicato all’elaborazione di documenti medici, la porzione delle cartelle cliniche annotata nei reparti di pronto soccorso. Questo tipo di documenti è cosı̀ rumoroso che in genere le cartelle cliniche difficilmente posono essere direttamente elaborate in maniera automatica. Pur essendo il problema di ripulire questo tipo di documenti un problema rilevante e diffuso, non esisteva un dizionario completo per trattare questo linguaggio settoriale. In questo lavoro proponiamo e valutiamo una risorsa finalizzata a condurre questo tipo di elaborazione sulle cartelle cliniche.
{"title":"A Resource for Detecting Misspellings and Denoising Medical Text Data","authors":"Enrico Mensa, G. Marino, Davide Colla, Matteo Delsanto, Daniele P. Radicioni","doi":"10.4000/books.aaccademia.8728","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8728","url":null,"abstract":"English. In this paper we propose a method for collecting a dictionary to deal with noisy medical text documents. The quality of such Italian Emergency Room Reports is so poor that in most cases these can be hardly automatically elaborated; this also holds for other languages (e.g., English), with the notable difference that no Italian dictionary has been proposed to deal with this jargon. In this work we introduce and evaluate a resource designed to fill this gap.1 Italiano. In questo lavoro illustriamo un metodo per la costruzione di un dizionario dedicato all’elaborazione di documenti medici, la porzione delle cartelle cliniche annotata nei reparti di pronto soccorso. Questo tipo di documenti è cosı̀ rumoroso che in genere le cartelle cliniche difficilmente posono essere direttamente elaborate in maniera automatica. Pur essendo il problema di ripulire questo tipo di documenti un problema rilevante e diffuso, non esisteva un dizionario completo per trattare questo linguaggio settoriale. In questo lavoro proponiamo e valutiamo una risorsa finalizzata a condurre questo tipo di elaborazione sulle cartelle cliniche.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133601947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8743
Alessio Miaschi, Chiara Alzetta, D. Brunato, F. Dell’Orletta, Giulia Venturi
This paper explores the relationship between Neural Language Model (NLM) perplexity and sentence readability. Start-ing from the evidence that NLMs implicitly acquire sophisticated linguistic knowledge from a huge amount of training data, our goal is to investigate whether perplexity is affected by linguistic features used to automatically assess sentence readability and if there is a correlation between the two metrics. Our findings suggest that this correlation is actually quite weak and the two metrics are affected by different linguistic phenomena. 1
{"title":"Is Neural Language Model Perplexity Related to Readability?","authors":"Alessio Miaschi, Chiara Alzetta, D. Brunato, F. Dell’Orletta, Giulia Venturi","doi":"10.4000/books.aaccademia.8743","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8743","url":null,"abstract":"This paper explores the relationship between Neural Language Model (NLM) perplexity and sentence readability. Start-ing from the evidence that NLMs implicitly acquire sophisticated linguistic knowledge from a huge amount of training data, our goal is to investigate whether perplexity is affected by linguistic features used to automatically assess sentence readability and if there is a correlation between the two metrics. Our findings suggest that this correlation is actually quite weak and the two metrics are affected by different linguistic phenomena. 1","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133792602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8718
Andrea Mattei, D. Brunato, F. Dell’Orletta
This paper presents a new corpus for the Italian language representative of the fanfiction genre. It comprises about 55k usergenerated stories inspired to the original fantasy saga “Harry Potter” and published on a popular website. The corpus is large enough to support data-driven investigations in many directions, from more traditional studies on language variation aimed at characterizing this genre with respect to more traditional ones, to emerging topics in computational social science such as the identification of factors involved in the success of a story. The latter is the focus of the presented case-study, in which a wide set of multi-level linguistic features has been automatically extracted from a subset of the corpus and analysed in order to detect the ones which significantly discriminate successful from unsuccessful
{"title":"The Style of a Successful Story: a Computational Study on the Fanfiction Genre","authors":"Andrea Mattei, D. Brunato, F. Dell’Orletta","doi":"10.4000/books.aaccademia.8718","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8718","url":null,"abstract":"This paper presents a new corpus for the Italian language representative of the fanfiction genre. It comprises about 55k usergenerated stories inspired to the original fantasy saga “Harry Potter” and published on a popular website. The corpus is large enough to support data-driven investigations in many directions, from more traditional studies on language variation aimed at characterizing this genre with respect to more traditional ones, to emerging topics in computational social science such as the identification of factors involved in the success of a story. The latter is the focus of the presented case-study, in which a wide set of multi-level linguistic features has been automatically extracted from a subset of the corpus and analysed in order to detect the ones which significantly discriminate successful from unsuccessful","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127876897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8485
Martina Ducret, Lauren Kruse, Carlos Martinez, Anna Feldman, Jing Peng
We explore linguistic features that contribute to sarcasm detection. The linguistic features that we investigate are a combination of text and word complexity, stylistic and psychological features. We experiment with sarcastic tweets with and without context. The results of our experiments indicate that contextual information is crucial for sarcasm prediction. One important observation is that sarcastic tweets are typically incongruent with their context in terms of sentiment or emotional load.
{"title":"You Don’t Say… Linguistic Features in Sarcasm Detection","authors":"Martina Ducret, Lauren Kruse, Carlos Martinez, Anna Feldman, Jing Peng","doi":"10.4000/books.aaccademia.8485","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8485","url":null,"abstract":"We explore linguistic features that contribute to sarcasm detection. The linguistic features that we investigate are a combination of text and word complexity, stylistic and psychological features. We experiment with sarcastic tweets with and without context. The results of our experiments indicate that contextual information is crucial for sarcasm prediction. One important observation is that sarcastic tweets are typically incongruent with their context in terms of sentiment or emotional load.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126198629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}