Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8515
Manuel Favaro, M. Biffi, Simonetta Montemagni
{"title":"Risorse linguistiche di varietà storiche di italiano: il progetto TrAVaSI","authors":"Manuel Favaro, M. Biffi, Simonetta Montemagni","doi":"10.4000/books.aaccademia.8515","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8515","url":null,"abstract":"","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131071888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8620
Alina Karakanta, Matteo Negri, M. Turchi
Subtitles, in order to achieve their purpose of transmitting information, need to be easily readable. The segmentation of subtitles into phrases or linguistic units is key to their readability and comprehension. However, automatically segmenting a sentence into subtitles is a challenging task and data containing reliable human segmentation decisions are often scarce. In this paper, we leverage data with noisy segmentation from large subtitle corpora and combine them with smaller amounts of high-quality data in order to train models which perform automatic segmentation of a sentence into subtitles. We show that even a minimum amount of reliable data can lead to readable subtitles and that quality is more important than quantity for the task of subtitle segmentation.1
{"title":"Point Break: Surfing Heterogeneous Data for Subtitle Segmentation","authors":"Alina Karakanta, Matteo Negri, M. Turchi","doi":"10.4000/books.aaccademia.8620","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8620","url":null,"abstract":"Subtitles, in order to achieve their purpose of transmitting information, need to be easily readable. The segmentation of subtitles into phrases or linguistic units is key to their readability and comprehension. However, automatically segmenting a sentence into subtitles is a challenging task and data containing reliable human segmentation decisions are often scarce. In this paper, we leverage data with noisy segmentation from large subtitle corpora and combine them with smaller amounts of high-quality data in order to train models which perform automatic segmentation of a sentence into subtitles. We show that even a minimum amount of reliable data can lead to readable subtitles and that quality is more important than quantity for the task of subtitle segmentation.1","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126312546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8940
Rocco Tripodi
English. This paper presents a new topic modelling framework inspired by game theoretic principles. It is formulated as a normal form game in which words are represented as players and topics as strategies that the players select. The strategies of each player are modelled with a probability distribution guided by a utility function that the players try to maximize. This function induces players to select strategies similar to those selected by similar players and to choice strategies not shared with those selected by dissimilar players. The proposed framework is compared with state-of-the-art models demonstrating good performances on stan-
{"title":"Topic Modelling Games","authors":"Rocco Tripodi","doi":"10.4000/books.aaccademia.8940","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8940","url":null,"abstract":"English. This paper presents a new topic modelling framework inspired by game theoretic principles. It is formulated as a normal form game in which words are represented as players and topics as strategies that the players select. The strategies of each player are modelled with a probability distribution guided by a utility function that the players try to maximize. This function induces players to select strategies similar to those selected by similar players and to choice strategies not shared with those selected by dissimilar players. The proposed framework is compared with state-of-the-art models demonstrating good performances on stan-","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132176502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8225
Luca Bacco, Andrea Cimino, L. Paulon, M. Merone, F. Dell’Orletta
In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre–processed the dataset to extract hand-crafted features exploited by the classifier. For the second one, we oversampled the dataset to achieve better results. Our results show that the SVMbased system, without the worry of having to oversample, has better performance than the BERT-based one, achieving an F1-score of 91.21%.
{"title":"A Machine Learning approach for Sentiment Analysis for Italian Reviews in Healthcare","authors":"Luca Bacco, Andrea Cimino, L. Paulon, M. Merone, F. Dell’Orletta","doi":"10.4000/books.aaccademia.8225","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8225","url":null,"abstract":"In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre–processed the dataset to extract hand-crafted features exploited by the classifier. For the second one, we oversampled the dataset to achieve better results. Our results show that the SVMbased system, without the worry of having to oversample, has better performance than the BERT-based one, achieving an F1-score of 91.21%.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123407521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.9020
S. Kopp
{"title":"Interaction-aware multimodal dialogue with conversational agents","authors":"S. Kopp","doi":"10.4000/books.aaccademia.9020","DOIUrl":"https://doi.org/10.4000/books.aaccademia.9020","url":null,"abstract":"","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115709030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8695
Claudia Marzi, Anna Rodella, Andrea Nadalini, Loukia Taxitari, Vito Pirrelli
The movement of a child’s index finger that points to a printed text while (s)he is reading may provide a proxy for the child’s eye movements and attention focus. We validated this correlation by showing a quantitative analysis of patterns of “finger-tracking” of Italian early graders engaged in reading a text displayed on a tablet. A web application interfaced with the tablet monitors the reading behaviour by modelling the way the child points to the text while reading. The analysis found significant developmental trends in reading strategies, marking an interesting contrast between typically developing and atypically developing readers.
{"title":"Does Finger-Tracking Point to Child Reading Strategies?","authors":"Claudia Marzi, Anna Rodella, Andrea Nadalini, Loukia Taxitari, Vito Pirrelli","doi":"10.4000/books.aaccademia.8695","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8695","url":null,"abstract":"The movement of a child’s index finger that points to a printed text while (s)he is reading may provide a proxy for the child’s eye movements and attention focus. We validated this correlation by showing a quantitative analysis of patterns of “finger-tracking” of Italian early graders engaged in reading a text displayed on a tablet. A web application interfaced with the tablet monitors the reading behaviour by modelling the way the child points to the text while reading. The analysis found significant developmental trends in reading strategies, marking an interesting contrast between typically developing and atypically developing readers.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133485950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8825
Alessio Palmero Aprosio, S. Menini, Sara Tonelli
English. While text-only datasets are widely produced and used for research purposes, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal data. We therefore developed CREENDER, an annotation tool to create multimodal datasets with images associated with semantic tags and comments, which we make freely available under Apache 2.0 license. The software has been extensively tested with school classes, allowing us to improve the tool and add useful features not planned in the first development phase.1 Italiano. Mentre i dataset testuali sono ampiamenti creati e usati per scopi di ricerca, le limitazioni imposte dai social media basati sulle immagini (come Instagram) rendono difficile per i ricercatori sperimentare con dati multimodali. Abbiamo quindi sviluppato CREENDER, un tool di annotazione per la creazione di dataset multimodali in cui immagini vengono associate a etichette semantiche e commenti, e che abbiamo reso disponibile gratuitamente con la licenza Apache 2.0. Il software è stato testato in un laboratorio con alcune classi scolastiche, permettendoci di ottimizzare alcune procedure e di aggiungere feature non previste nella
{"title":"The CREENDER Tool for Creating Multimodal Datasets of Images and Comments","authors":"Alessio Palmero Aprosio, S. Menini, Sara Tonelli","doi":"10.4000/books.aaccademia.8825","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8825","url":null,"abstract":"English. While text-only datasets are widely produced and used for research purposes, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal data. We therefore developed CREENDER, an annotation tool to create multimodal datasets with images associated with semantic tags and comments, which we make freely available under Apache 2.0 license. The software has been extensively tested with school classes, allowing us to improve the tool and add useful features not planned in the first development phase.1 Italiano. Mentre i dataset testuali sono ampiamenti creati e usati per scopi di ricerca, le limitazioni imposte dai social media basati sulle immagini (come Instagram) rendono difficile per i ricercatori sperimentare con dati multimodali. Abbiamo quindi sviluppato CREENDER, un tool di annotazione per la creazione di dataset multimodali in cui immagini vengono associate a etichette semantiche e commenti, e che abbiamo reso disponibile gratuitamente con la licenza Apache 2.0. Il software è stato testato in un laboratorio con alcune classi scolastiche, permettendoci di ottimizzare alcune procedure e di aggiungere feature non previste nella","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133023298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8839
Andrea Amelio Ravelli, A. Origlia, F. Dell’Orletta
This paper explores the possibility to annotate engagement as an extra-linguistic information in a multimodal corpus of guided tours in cultural sites. Engagement has been annotated in terms of gain or loss of perceived attention from the audience, and this information has been aligned to the transcription of the speech from the guide. A preliminary analysis suggests that the level of engagement correlates with some specific linguistic features, opening up to possible future exploitation.
{"title":"Exploring Attention in a Multimodal Corpus of Guided Tours","authors":"Andrea Amelio Ravelli, A. Origlia, F. Dell’Orletta","doi":"10.4000/books.aaccademia.8839","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8839","url":null,"abstract":"This paper explores the possibility to annotate engagement as an extra-linguistic information in a multimodal corpus of guided tours in cultural sites. Engagement has been annotated in terms of gain or loss of perceived attention from the audience, and this information has been aligned to the transcription of the speech from the guide. A preliminary analysis suggests that the level of engagement correlates with some specific linguistic features, opening up to possible future exploitation.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124624650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8443
Andrea Gregor de Varda, C. Strapparava
The present paper aims to investigate the nature and the extent of cross-linguistic phonosemantic correspondences within a computational framework. An LSTMbased Recurrent Neural Network is trained to associate the phonetic representation of a word, encoded as a sequence of feature vectors, to its corresponding semantic representation in a multilingual vector space. The processing network is tested, without further training, in a language that does not appear in the training set. The performance of the multilingual model is compared with a monolingual upper bound and a randomized baseline. After the quantitative evaluation of its performance, a qualitative analysis is carried out on the network’s most effective predictions, showing an inhomogeneous distribution of phonosemantic information in the lexicon, influenced by semantic, syntactic, and pragmatic factors.
{"title":"Phonological Layers of Meaning: A Computational Exploration of Sound Iconicity","authors":"Andrea Gregor de Varda, C. Strapparava","doi":"10.4000/books.aaccademia.8443","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8443","url":null,"abstract":"The present paper aims to investigate the nature and the extent of cross-linguistic phonosemantic correspondences within a computational framework. An LSTMbased Recurrent Neural Network is trained to associate the phonetic representation of a word, encoded as a sequence of feature vectors, to its corresponding semantic representation in a multilingual vector space. The processing network is tested, without further training, in a language that does not appear in the training set. The performance of the multilingual model is compared with a monolingual upper bound and a randomized baseline. After the quantitative evaluation of its performance, a qualitative analysis is carried out on the network’s most effective predictions, showing an inhomogeneous distribution of phonosemantic information in the lexicon, influenced by semantic, syntactic, and pragmatic factors.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"7 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131485305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4000/books.aaccademia.8210
Chiara Alzetta, F. Dell’Orletta, S. Montemagni, P. Osenova, K. Simov, Giulia Venturi
The paper illustrates a case study aimed at identifying cross-lingual quantitative trends in the distribution of dependency relations in treebanks for typologically different languages. Preliminary results show interesting differences rooted either in language-specific peculiarities or crosslingual annotation inconsistencies, with a potential impact on different application scenarios. 1
{"title":"Quantitative Linguistic Investigations across Universal Dependencies Treebanks","authors":"Chiara Alzetta, F. Dell’Orletta, S. Montemagni, P. Osenova, K. Simov, Giulia Venturi","doi":"10.4000/books.aaccademia.8210","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8210","url":null,"abstract":"The paper illustrates a case study aimed at identifying cross-lingual quantitative trends in the distribution of dependency relations in treebanks for typologically different languages. Preliminary results show interesting differences rooted either in language-specific peculiarities or crosslingual annotation inconsistencies, with a potential impact on different application scenarios. 1","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130202740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}